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PREFACE 


The  work  reported  here  was  done  for  the  Rand  Urban  Policy  Analysis  Program, 
which  is  sponsored  by  the  National  Science  Foundation.  The  Urban  Program  has 
concentrated  its  efforts  on  research  and  analysis  in  three  American  metropolitan 
areas:  San  Jose,  Seattle,  and  St.  Louis.  The  purpose  of  this  report  is  to  aid  analysts 
in  generalizing  their  results  from  these  cities  to  others,  and  in  selecting  suitable 
cities  for  future  study.  Although  the  work  tried  specifically  to  determine  policy 
relevant  differences  and  similarities  among  American  cities,  the  data  collected  for 
this  purpose  have  proved  useful  in  other  cross-sectional  studies,  including  the  con- 
tinuing analysis  of  central  city  decline. 

Emmett  Keeler  is  a  member  of  the  Rand  Economics  Department,  and  William 
Rogers  is  a  consultant  to  The  Rand  Corporation. 
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SUMMARY 


This  study  presents  a  new  classification  of  the  125  largest  American  metropoli- 
tan areas.  Its  purpose  is  to  help  the  Rand  Urban  Studies  Group  make  generalizations 
about  the  country  as  a  whole  from  the  detailed  studies  of  specific  cities.  Based  on 
a  data  file  of  important  structural  variables,  the  classification  shows  how  representa- 
tive an  urban  area  is  and  is  thus  useful  in  weighing  evidence  and  in  selecting  further 
areas  for  study.  The  data  file  is  an  ongoing  resource  that  is  being  used  in  cross- 
sectional  testing  of  theories — for  example,  in  Appendix  D  a  statistical  model  is 
presented  of  the  relationship  between  population  and  economic  growth. 

The  specific  variables  used  here  were  chosen  from  the  experience  of  earlier 
Rand  urban  studies,  which  focused  on  growth  and  decline  of  areas,  social  and  racial 
inequality,  government  and  political  structure,  and  city-suburban  relationships. 
The  resulting  53  variables  measure  these  themes  and  the  urban  geography  that  is 
their  setting.  Tables  of  the  range  of  these  variables  show  the  enormous  diversity  of 
American  urban  areas. 

Factor  analysis  produced  eight  synthetic  variables  or  "factors,"  which  contain 
much  of  the  variation  of  the  53  original  variables.  These  factors  are:  manufacturing, 
income  inequality,  ghetto-suburb  contrasts,  growth,  age  of  population,  unemploy- 
ment, density,  and  education.  Instead  of  53  variables,  each  city  may  be  represented 
by  its  eight  factor  scores. 

The  metropolitan  areas  were  divided  into  clusters  of  cities  "similar"  in  these 
eight  dimensions.  Two  types  of  results  are  presented:  a  complete  list  of  the  cities 
divided  into  ten  clusters,  where  each  cluster  can  be  characterized  by  its  "most 
representative"  city — Columbia,  San  Diego,  South  Bend,  Knoxville,  Dallas,  Worces- 
ter, Cleveland,  Oxnard,  San  Antonio,  and  Pittsburgh;  and  a  tree  of  the  cluster  types 
where  the  cities  are  divided  into  4,  6,  8,  10,  and  14  clusters.  The  tree  shows  how  the 
larger  clusters  subdivide.  For  example,  the  Southern  cities  cluster  is  divided  into 
mature  cities  (Columbia)  and  newer  cities  without  very  distant  suburbs  (Knoxville); 
these  newer  cities  divide  in  turn  into  those  with  stable  (Charlotte)  and  declining 
(Chattanooga)  populations.  Although  there  are  no  regional  variables  per  se,  the  clus- 
ters generally  represent  geographic  regions,  since  regions  tend  to  share  a  common 
history  and  economic  development,  which  is  reflected  in  the  data. 

This  work  is  supplemented  by  recent  literature  described  in  Appendix  C.  It  has 
been  shown,  for  example,  that  the  definition  of  urban  area — whether  legal  city, 
urbanized  area,  or  standard  metropolitan  statistical  area — does  not  affect  the  results 
of  most  statistical  analyses.  Also,  as  pointed  out  by  other  studies,  factor  analysis  has 
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great  value  in  exploratory  work:  Although  there  is  ambiguity  about  what  the  factors 
mean,  they  have  mathematical  properties  that  save  time  and  reduce  exploratory 
problems.  Appendix  B  discusses  at  greater  length  the  philosophy  and  method  of 
exploratory  as  opposed  to  classical  data  analysis.  Accurate  probability  statements 
are  waived  for  the  chance  to  let  expert  judgment  and  new  findings  interact  with  the 
data. 
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I.  INTRODUCTION 


A  primary  objective  of  the  Rand  Urban  Policy  Analysis  Program  is  to  generalize 
about  national  urban  problems  and  policies.  However,  operational  considerations 
require  most  of  the  effort  to  be  in-depth  studies  of  a  small  number  of  cities.1  The 
problem,  then,  is  how  to  use  the  knowledge  gained  from  these  studies  to  make 
general  conclusions  on  policy  for  the  large  and  diverse  set  of  American  cities.  The 
work  reported  here  is  designed  to  help  in  that  generalization. 

We  will  briefly  describe  our  basic  strategy  for  generalization.  Hypotheses  are 
formed  in  the  course  of  in-depth  studies  of  individual  cities.  These  hypotheses  are 
then  tested  on  other  intensively  studied  cities.  If  the  results  are  consistent,  they  can 
be  extrapolated  by  means  of  statistical  procedures  on  data  from  the  entire  set  of 
large  American  cities. 

There  are  many  differences  between  this  strategy  and  that  of  the  traditional 
scientific  experiment.  These  differences  are  unavoidable,  given  our  aims  and  the 
subject  under  investigation,  but  they  should  be  acknowledged.  First,  because  of  the 
extreme  complexity  and  diversity  of  city  settings  and  interactions  and  the  difficulty 
of  measuring  them,  no  strict  scientific  control  procedures  are  possible.  We  intend  to 
draw  inferences  that  are  plausible  and  consistent  with  our  observations,  but  we  do 
not  expect  that  our  findings  can  be  "proved."  Another  difference  lies  in  the  transient 
nature  of  urban  knowledge — the  phenomena  under  investigation  are  changing  rap- 
idly, so  that  many  of  the  better  policies  of  today  may  be  useless  20  years  from  now. 
A  third  difference  is  caused  by  our  current  lack  of  knowledge  as  to  what  is  important. 
In  contrast  to  scientific  verification,  in  which  the  theory  is  set  out  in  advance  and 
the  good  experiment  has  no  surprises,  our  work  is  sequential.  As  new  hypotheses  are 
formed  or  new  important  variables  discovered,  we  try  to  improve  our  explanations. 
New  analysis  may  suggest  further  new  theories.  We  widen  the  scope  of  our  inquiry 
at  some  expense  of  certainty  in  the  results. 

This  work  is  designed  to  help  generalization  in  three  ways.  First,  we  have 
created  a  data  file  of  variables  that  seem  important  in  theory  and  in  the  specific 
cities  studied.  The  file  can  be  used  directly  in  cross-sectional  testing  of  theories 
developed  for  the  individual  cities  studied;  for  example,  in  Appendix  D  we  give  such 
an  analysis  of  the  relationship  between  economic  and  population  growth  in  major 
urban  areas.  Second,  the  city  classification  reported  here  is  useful  in  selecting  the 

1  To  date,  the  cities  of  San  Jose,  Seattle,  and  St.  Louis  have  been  studied.  The  work  on  San  Jose  is 
complete,  but  work  continues  on  Seattle  and  St.  Louis. 


1 


2 


cities  to  be  studed  intensively.  Since  we  can  choose  only  a  limited  number  of  cities 
to  study  in  detail,  our  aim  is  to  pick  cities  that  will  be  representative  of  most  types 
of  American  cities.  Although  a  balanced  experimental  design  is  impossible  because 
of  the  large  number  of  relevant  parameters  and  our  incomplete  knowledge  of  which 
are  most  important,  if  our  findings  hold  up  on  a  wide  variety  of  cities,  then  they  are 
plausibly  universal.  The  final  aid  to  generalization  lies  in  the  weighing  of  evidence: 
How  idiosyncratic  is  Seattle  or  St.  Louis?  If  certain  phenomena  are  present  there, 
where  else  might  they  appear?  By  highlighting  what  appear  to  be  the  important 
parameters  of  city  variation,  we  can  get  a  better  idea  of  the  range  and  limitations 
of  policy  conclusions  based  on  evidence  from  a  few  cities. 

Since  Thorndike's  pioneering  work  in  the  1930s,  there  have  been  many  at- 
tempts at  city  classification.2  In  Appendix  C,  we  discuss  how  that  literature  comple- 
ments our  own  work,  but  it  should  be  noted  here  that  city  classifiers  today,  using 
experience  and  modern  computers,  have  produced  some  technically  excellent  stud- 
ies and  that  there  are  a  number  of  similarities  between  our  classification  and  others. 
Indeed,  there  would  almost  have  to  be,  since  a  classification  that  did  much  violence 
to  common  sense  would  hardly  be  useful  for  policy  purposes,  or  possible  to  justify. 

What,  then,  is  the  need  for  another  study?  First,  our  study  is  up  to  date,  using 
mainly  1970  Census  information.  Since  many  policy  relevant  parameters  of  city 
settings  are  changing  rapidly,  such  timeliness  is  essential.  Second,  most  of  the 
earlier  studies  used  a  limited  set  of  variables,  generally  reflecting  economic  base  or 
demography.  In  contrast,  we  have  used  the  experience  of  early  studies  to  select 
variables  that  bear  on  the  currently  most  important  urban  problems — including 
city-suburban  differences,  local  government,  and  segregation.  Most  earlier  studies 
classified  towns  down  to  a  rather  small  size.  We  use  only  large  metropolitan  areas, 
thus  avoiding  factors  more  relevant  to  small  town  classification.  In  contrast  to  some 
studies  that  seemed  to  collect  data  indiscriminately,  we  selected  variables  carefully 
and  normalized  them  by  cost  of  living  deflators,  population,  and  other  special  trans- 
formations to  better  reflect  our  interest  in  important  qualitative  differences. 

2  E.  L.  Thorndike,  Your  City,  Harcourt,  New  York,  1939. 


II.   THE  DATA 


Our  objective  in  building  the  data  file  was  to  extract  something  useful  and 
manageable  from  voluminous  amounts  of  data  available  in  various  sources.  A  deci- 
sion was  made  to  restrict  the  data  collection  to  the  125  urban  areas  with  over  250,000 
population;3  these  are  pictured  in  Fig.  1.  This  lowers  costs  and  focuses  attention  on 
phenomena  that  relate  to  the  larger  urban  areas,  which  the  Rand  Urban  Policy 
Analysis  Program  was  set  up  to  study.  In  any  event,  we  have  included  most  of  the 
country:  The  125  areas  contain  almost  60  percent  of  the  population  of  the  United 
States.  For  reasons  of  economy,  we  used  mainly  data  that  were  already  collected  and 
easily  available,  such  as  the  1970  U.S.  Census  of  Population  and  the  1967  City  and 
County  Data  Book.  In  some  instances,  we  had  to  impute  missing  values  by  multiple 
regression. 

Variables  were  selected  that  bore  on  some  of  the  major  themes  of  interest  of  the 
first  set  of  Rand  urban  studies.  These  themes  were: 

•  urban  growth  and  decline 

•  prosperity  and  poverty 

•  race  and  ethnic  minorities 

•  city-suburban  relations 

•  government  and  politics 

In  addition,  we  included  variables  giving  policy  settings  and  city  pathology: 

•  demography 
.  geography 

•  health  and  crime 

The  process  of  building  and  using  the  file  is  on-going;  as  new  important  variables 
are  discovered  in  specific  analyses,  they  are  added.  A  list  of  the  variables  and  their 
sources  are  given  in  Appendix  A. 

After  the  raw  data  have  been  collected,  they  must  be  carefully  normalized  to 
a  meaningful  form.  It  is  total  income  or  per  capita  income  that  is  important?  Is 
income  itself  or  its  log  more  appropriate?  In  Appendix  B,  we  discuss  problems  of 

3  To  ease  the  collection  of  data,  we  used  the  Standard  Metropolitan  Statistical  Area  (SMSA)  as  our 
definition  of  urban  area.  These  are  general-purpose  units  established  by  the  federal  government,  using 
counties  as  building  blocks. 
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selection  and  normalization  at  greater  length,  but  our  general  approach  has  been 
to  emphasize  qualitative  differences.  Thus,  we  have  used  rates  and  per  capita  meas- 
ures wherever  possible. 

American  urban  areas  are  very  diverse.  New  York  has  three  times  as  many 
dentists  per  capita  as  El  Paso,  and  600  times  as  many  robberies  per  capita  as 
Appleton.  Jacksonville  has  200  times  the  area  of  New  York.  Tampa  has  three  times 
as  many  old  people,  per  capita,  as  Newport  News.  These  nuggets,  gleaned  from  the 
tables  of  the  distributions  of  the  53  variables  in  Appendix  A,  are  representative.  This 
diversity  should  not  be  surprising.  Although  urban  areas  are  forced  by  their  size  and 
density  to  have  many  things  in  common — policemen,  garbage  collection,  traffic,  and 
so  forth — they  are  nevertheless  formed  in  particular  places  and  with  particular 
people  that  span  the  American  scene. 


III.    REPRESENTATION  METHODOLOGY 


TWO  VIEWS  OF  CITY  DIVERSITY 

There  are  two  extreme  ways  of  looking  at  the  diversity  of  cities.  One  view  is  that 
each  city  is  sui  generis,  hence  that  unique  specific  local  factors  are  so  important 
nothing  can  be  inferred  about  one  city  by  studying  another.  The  other  is  that,  for 
policy  purposes,  the  same  model  fits  all  cities — the  only  difference  being  different 
values  for  certain  parameters  such  as  size,  income,  percent  of  blacks,  and  the  like. 
In  this  model,  which  underlies  statistical  cross-sectional  analyses  of  urban  phenome- 
na, variations  in  these  parameters  have  linear  additive  effects  on  outcomes.  If  this 
view  were  correct,  it  would  be  statistically  efficient  to  study  cities  with  extreme 
values  of  the  parameters.  The  remaining  cities  could  be  approximated  by  convex 
combinations  of  these  extreme  cities — for  example,  Boston  =  .02  San  Jose  +  .22 
Seattle  +  .02  St.  Louis  +  .02  Little  Rock  +  .4  Philadelphia  +  .32  Cincinnati.4  Effects 
of  various  policies  could  be  estimated  by  adding  the  effects  at  each  city. 

We  feel  that  there  is  a  certain  truth  in  both  views  of  city  diversity,  that  although 
it  is  possible  to  learn  a  lot  about  cities  in  general,  nonlinearities  abound  and  specific 
conditions  do  have  a  great  effect.  Thus,  instead  of  the  most  extreme  cities,  where 
indeed  some  fairly  idiosyncratic  things  may  be  happening,  we  are  looking  for  repre- 
sentative ones. 


A  SYSTEMATIC  APPROACH  TO  REPRESENTATION 

It  is  this  diversity  of  urban  areas  and  the  multiplicity  of  variables  that  can  be 
used  to  describe  them  that  confound  attempts  to  select  a  group  of  representative 
areas.  Yet  research  on  urban  problems,  and  on  policies  to  ameliorate  them,  cannot 
avoid  dealing  with  the  problem  of  representativeness.  The  cost  of  a  thorough  study 
of  even  one  problem  in  one  city  necessitates  the  selection  of  a  subgroup  to  represent 
cities  as  a  whole.  Similarly,  policy  recommendations  on  a  national  scale  necessitate 
generalizations  in  program  design  and  implementation. 

It  is  possible  to  use  an  informal  approach  to  representation,  with  an  intuitive 

4  We  have  computed  each  city's  "best"  expression  as  a  combination  of  the  six  cities  already  proposed 
for  study:  San  Jose,  Seattle,  St.  Louis,  Cincinnati,  Little  Rock,  and  Philadelphia.  Here  "best"  means 
lowest  squared  error  for  the  53  variables. 
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balancing  of  apparently  relevant  factors,  such  as  size  and  region.  The  approach  we 
used,  however,  is  city  classification  by  a  method  now  standard  in  the  field.5  It 
involves  two  steps:  The  data  are  organized  first  by  means  of  factor  analysis,  and  the 
cities  are  then  classified  by  means  of  their  factor  scores.6 

Factor  analysis  is  a  data  description  technique  that  reduces  the  dimensionality 
of  a  set  of  interrelated  variables.  It  assumes  that  different  phenomena  may  be  part 
of  the  same  underlying  pattern.  The  technique  was  first  used  by  psychologists  trying 
to  show  that  two  general  intelligence  "factors"  underlay  the  variety  of  testable 
intellectual  activities.  The  term  "factor"  stuck  and  is  used  to  connote  a  synthetic 
variable  that  accounts  for  observed  relationships  among  a  set  of  empirically  mea- 
sured variables.7 

Without  getting  into  technical  details,  we  would  like  to  show  how  factor  analy- 
sis enables  us  to  deal  systematically  with  the  city  classification.  Two  terms  will  be 
useful: 

•  factor  loading,  the  correlation  between  one  of  the  empirical  variables  and 
a  factor. 

•  factor  score,  the  ranking  or  rating  (expressed  in  standard  deviations)  of  a 
metropolitan  area  on  the  factor  that  represents  a  particular  pattern  of 
variables. 

An  example  will  illustrate  how  factor  analysis  reduces  the  dimensions  of  varia- 
tion. In  each  of  the  factor  analyses  performed  with  various  groups  of  cities,  a  small 
but  significant  factor  (stage  in  life  cycle)  consistently  appeared.  It  was  correlated 
with  a  group  of  variables  relating  to  the  age  of  the  population.  These  included  the 
birth  rate,  median  age,  and  percent  of  housing  that  was  crowded.  For  this  factor, 
presented  below  in  Table  6,  the  score  for  El  Paso,  3.31,  was  the  highest  in  the 
country,  and  Fort  Lauderdale's,  —3.27,  was  the  lowest.  Within  each  city,  one  num- 
ber represents  the  whole  set  of  age-related  data. 

The  number  of  factors  used  to  represent  the  cities  is  chosen  for  explanatory 
power  and  intuitive  appeal.  In  our  case,  the  first  eight  factors  explain  64  percent  of 
the  total  variance  of  the  53  empirical  variables,  and  (after  a  standard  transforma- 
tion) each  has  a  clear  interpretation.  The  remaining  factors  explain  less  of  the 
variance  and  do  not  have  a  very  straighforward  interpretation.  Each  metropolitan 
area  can  be  represented  in  an  eight-dimensional  space  by  its  factor  score  coordi- 
nates. Urban  areas  that  are  close  together  in  this  space  are  similar,  and  those  that 
are  far  apart  are  dissimilar.  So,  we  simply  divide  the  areas  into  "clusters"  of  neigh- 
boring points.8  The  clusters  are  relatively  homogeneous  groups  of  urban  areas.  The 
results  and  models  for  one  metropolitan  area  in  a  cluster  can  generally  be  expected 
to  carry  over  to  other  areas  in  the  same  cluster;  if  not,  we  will  want  to  discover  what 
important  aspects  of  the  urban  scene  are  not  covered  by  our  data.  By  select  ing  cities 
from  many  different  clusters,  we  should  be  able  to  cover  a  large  range  of  American 
cities. 

5  For  an  excellent  collection  of  recent  work  and  criticism  of  the  approach,  see  B.  Berry  (ed.),  City 
Classification  Handbook:  Methods  and  Applications,  Wiley,  New  York,  1972. 

G  Appendix  B  contains  a  more  precise  and  complete  description  of  factor  analysis,  the  clustering 
routine,  and  variable  transformations. 

7  Each  factor  is  a  linear  combination  of  the  original  variables. 

8  Our  program  clusters  the  points  so  that  the  sum  of  the  squared  distances  from  each  point  to  the 
center  of  its  cluster  is  minimized.  The  number  of  clusters  must  be  selected  in  advance. 


IV.    FACTOR  ANALYSIS  RESULTS 


Table  1  presents  the  factors — the  observed  underlying  patterns  in  the  variables. 
The  factors  are  named  subjectively  from  the  variables  with  high  loadings  on  the 
factor.  The  53  empirical  variables  are  quite  interdependent;  eight  factors  account 
for  64  percent  of  their  variance.  The  factors  are  not  absolutely  general.  Comparison 
with  other  studies,  presented  in  Appendix  C,  shows  that  the  factors  depend  heavily 
on  the  choice  of  variables  and  on  the  group  of  cities  selected.  For  variables  most 
interesting  to  us,  our  factors  show  what  patterns  appear  in  large  American  cities. 

An  example  may  show  the  advantages  and  disadvantages  of  using  one  number, 
the  "growth"  factor  score,  to  replace  the  variety  of  growth-related  variables.9  Little 
Rock  is  designated  an  Economic  Development  Agency  growth  center  and  has  recent- 
ly had  much  commercial  growth.  However,  its  score  on  growth  is  —  .9,  which  puts 
it  in  the  bottom  quarter  of  cities  studied.  The  low  factor  score  does  not  mean  that 
Little  Rock  is  not  growing  in  any  sense,  but  that  it  is  quite  different  from  the  normal 
pattern  of  American  "growth"  cities:  It  is  losing  population,  its  climate  is  rather 
undesirable,  there  is  great  inequality  in  income,  and  there  has  been  little  increase 
in  manufacturing  production.  Per  capita  income  growth,  the  only  growth  variable 
we  collected  for  which  Little  Rock  is  above  average  nationally,  is  highly  related  to 
poverty  and  loads  mainly  on  the  inequality  factor.  Even  if  income  growth  had  loaded 
on  the  growth  factor,  Little  Rock  would  not  score  high.  Since  the  factor  is  an  average 
of  various  growth  measures,  a  city  must  be  growing  in  most  of  these  measures  to 
obtain  a  high  score.  The  factors  are  chosen  so  that  they  are  orthogonal  to  one 
another.  This  orthogonality  has  important  effects.  It  means  that  each  factor  contrib- 
utes independently  to  the  character  of  a  city,  and  this  makes  factors  useful  in 
preliminary  statistical  model  building.  However,  there  are  disadvantages  for  inter- 
pretation. The  whole  set  must  be  kept  in  mind.  Education,  for  example,  represents 
those  educational  characteristics  that  remain  after  we  adjust  for  the  "growth"  or 
"poverty"  factors.  In  some  ways,  the  single  variable  Percent  High  School  Graduates 
is  a  better  measure  of  educational  status  than  the  educational  factor.  For  our  pur- 
poses, however,  the  set  of  factors  locates  cities  better  than  a  set  of  corresponding 
variables  that  would  be  significantly  intercorrelated. 

9  Too  much  weight  should  not  be  placed  on  factor  names.  Population  growth  correlates  .8  with  the 
factor  "growth."  Thus  (.8)2  or  .64  of  the  variance  in  population  growth  is  predictable  from  the  factor  score. 
The  standard  deviation  of  population  growth,  after  adjustment  by  the  growth  factor,  is  V  1  —  .64  = 
.6  of  its  former  size.  This  surprisingly  small  reduction  in  predictability  is  brought  out  in  Tables  2-9.  Note 
how  the  rankings  based  on  factor  scores  do  not  lead  to  consistent  rankings  based  on  the  key  variables 
in  the  factor. 
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Table  1 

DIMENSIONS  OF  THE  AMERICAN  URBAN  SYSTEM  IN  1970 


Factor 


Number 

Factor  Description 

1 

Nonmanuf actur ing  Economic  Base 

2 

Inequality,  Poverty',  and  Segregation 

3 

Suburbs  and  Ghetto  Contrasts 

A 

Growth 

5 

Stage  in  Life  Cycle 

6 

Welfare  and  Unemployment 

7 

Density 

3 

Education 

Tables  2-9  give  the  variables  with  the  highest  loadings  on  each  factor  and  the 
cities  with  the  highest  and  lowest  factor  scores.  The  variables  are  listed  in  order  of 
decreasing  loadings.  "Number"  refers  to  Table  A-l  in  Appendix  A,  which  has  sources 
and  applied  transformations  given  in  detail.  Loadings  are  given  in  percentages.  A 
negative  score  means  low  values  are  associated  with  the  factor.  Table  A-13  in  Appen- 
dix A  is  a  list  of  all  the  factor  scores. 


NONMANUFACTURING  ECONOMIC  BASE 

The  areas  high  on  this  factor  are  government,  recreation,  or  retirement  centers. 
They  are  less  affected  by  the  business  cycle  but  have  more  unequal  income  distribu- 
tions, possibly  reflecting  that  service  jobs  are  generally  lower  paying  than  blue  collar 
manufacturing  jobs.  Interestingly  enough,  these  areas  did  much  better  than  the 
manufacturing  cities  in  getting  OEO  money.  Manufacturing  cities  are  mainly  locat- 
ed in  the  Midwest  and  Northeast  and  nonmanufacturing  mainly  in  the  sun  belt. 


INEQUALITY,  POVERTY,  AND  SEGREGATION 

This  factor  picks  up  different  types  of  variables  associated  with  the  deep  South. 
All  measures  in  the  variables  of  local  spending  and  income  have  been  adjusted  for 
cost  of  living,  which  is  lower  for  these  cities  but  not  low  enough  to  bring  the  standard 
of  living  of  their  blacks  up  to  the  national  average.  Local  segregation,  the  Barry 
Goldwater  vote,  total  income,  percent  black,  and  the  Gini  coefficient  are  measures 
of  conservatism  and  inequality.  Perhaps  because  legal  segregation  has  ended,  there 
has  been  much  sorting  out  of  the  races  in  these  areas  by  white  suburbanization. 
Thus,  the  vacancy  rate  in  the  cities  is  high,  as  is  construction  employment.  There 
is  not  the  pattern  of  city  apartment-house  living,  and  indeed  the  SMSAs  include 
some  rural  areas,  so  that  blacks  tend  more  often  to  own  their  residences. 
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Table  2 
NONMANUFACTURING  AREAS 


Loadings  on  Constituent  Variables:     Factor  One 


Number3 

Name 

Loadings 

14 

Manufacturing  Ratio 

-74 

25 

Gain  Value  Added  (P.  Cap.) 

-71 

24 

Change  Unemployment 

-65 

5 

New  Capital  Expenditures 

-61 

52 

Federal  Employees 

59 

42 

Anti-Poverty  0E0  Funds 

57 

8 

Gini  Coefficient 

56 

Percent  Nonagricultural 

Manufacturing  Areas^ 

Scores 

Employees  in  Manufacturing 

Flint,  Michigan 

-3.0 

46.8 

Rockford,  111. 

-2.3 

49.1 

Wichita,  Kansas 

-2.1 

28.4 

Detroit,  Mich. 

-1.8 

37.6 

Rochester,  N.Y. 

-1.7 

41.6 

Seattle,  Wash. 

-1.7 

24.9 

c 

Nonmanuf acturing  Areas 

Washington,  D.C. 

2.5 

3.8 

Tucson,  Ariz. 

2.2 

8.8 

Albuquerque,  N.M. 

2.1 

8.6 

Salt  Lake  City,  Utah 

1.8 

16.7 

Jacksonville,  Fla. 

1.7 

13.0 

Honolulu,  Hawaii 

1.7 

7.4 

dumber  on  list  in 

Appendix 

A. 

^These  areas  have 

the  lowest  factor  scores. 

c 

These  areas  have 

the  highest  factor  scores. 
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Table  3 


INEQUALITY,  POVERTY,  AND  SEGREGATION 

Loadings  on  Constituent  Variables : 

Factor  Two 

Number3  Name 

Loading 

35          Total  Income/Black  Income 

71 

8         Gini  Coefficient 

69 

28          Black  Owner  Occupied  Housing 

66 

34          Black  Family  Median  Income 

-66 

47          Local  Segregation 

64 

53          LBJ  Vote 

-60 

44          Infant  Mortality 

57 

45          Cost  of  Living 

-57 

3          Construction  Employment 

54 

39          Vacancy  Rate 

44 

46          Income  Growth 

41 

12          Population  in  2000 

-40 

Areas  with 

Great 

Black  Family 

,  .  b 

Inequality  Scores 

Median  Income 

Shreveport,  La.  2.8 

4635 

West  Palm  Beach,   Fla.  2.4 

5685 

Fort  Lauderdale,   Fla.  2.4 

6676 

Jackson,  Miss.  2.2 

4824 

Little  Rock,  Ark.  1.9 

4898 

Charleston,   S.C.  1.7 

5121 

Baton  Rouge,   La.  1.7 

5610 

c 

Areas  of  Relative  Equality 

San  Jose,   Calif.  -2.1 

10574 

Binghamton,  N.Y.  -2.1 

9558 

Lorain,  Ohio  -2.0 

8614 

Jersey  City,  N.J.  -1.6 

7169 

Number  on  list  in  Appendix  A. 

These  areas  have  the  lowest  factor  scores. 

These  areas  have  the  highest  factor  scores. 
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SUBURBS  AND  GHETTO  CONTRASTS 

When  most  people  talk  about  the  "urban  crisis,"  this  factor  is  probably  what 
they  have  in  mind.  These  are  big  cities  with  rich  suburbs  on  the  periphery  and  black 
ghettos  in  the  center.  In  some  sense,  they  are  middle-aged  cities.  In  very  young 
western  cities,  the  poorer  people  live  on  the  outskirts,  and  the  richer  live  downtown. 
For  such  cities  as  Cleveland,  the  aging  process  has  hit  the  center,  but  the  suburbs 
are  still  open  and  new.  In  the  denser  and  oldest  cities,  picked  out  by  Factor  7,  the 
suburbs  have  been  urbanized. 


RECENT  GROWTH 

The  cities  that  have  been  growing  over  the  last  few  years  represent  a  new  type 
of  economic  strength — weather  and  space.  Economic  growth  generally  reflects  a  new 
type  of  economic  resource.  In  the  United  States,  the  original  growth  areas  were  ports 
with  access  to  agricultural  markets,  then  came  railroad  and  manufacturing  centers 
located  by  iron  and  coal  sources  and,  as  the  country  spread,  regional  market  places. 
Since  World  War  II,  weather,  beauty,  and  space  have  become  key  considerations — 
valuable  to  retirement  and  to  types  of  new  industry  that  do  not  need  close  ties  to 
minerals  or  older  manufacturing  centers.  The  new  growth  centers  have  been  in  the 
so-called  "sun  belt."  Housing  is  new  and  in  short  supply,  so  rents  are  high.  The 
declining  areas  are  in  the  South  and  in  regions  of  such  declining  industries  as 
mining.  Although  it  is  not  picked  up  in  our  data,  the  same  forces  are  pulling  types 
of  light  industry  that  are  free  to  move  from  the  central  city  to  the  suburbs. 


STAGE  IN  LIFE  CYCLE 

Poor  rural  families  are  bigger  and  younger,  so  on  the  average  blacks  and  people 
with  Spanish  surnames  are  considerably  younger  than  whites.  However,  the  main 
reason  for  difference  in  this  factor  is  migration:  The  oldest  populations  are  either 
the  retirement  communities  of  Florida  or  towns  that  can't  hold  on  to  their  young 
people.  The  youngest  are  either  rapid  growth  centers  or  heavily  Spanish  speaking, 
such  as  in  Texas  and  California.  Within  most  urban  areas,  the  suburbs  are  consider- 
ably younger  than  the  central  city,  because  of  their  attraction  for  young  (even 
though  white)  families.  Continuing  migration  allows  areas  to  specialize  in  a  certain 
life  stage — retirement  facilities,  say,  or  suburban  family  housing.  The  area  stays  the 
same  but  the  inhabitants  come  and  go. 


WELFARE  AND  UNEMPLOYMENT 

This  is  a  fairly  minor  factor  reflecting  the  fact  the  unemployment  is  highly  tied 
to  welfare  and  hence  to  local  government  expenditures.  "Spanish"  is  in  the  factor 
because  of  its  California  emphasis;  in  the  1960s  in  California,  there  were  many  poor 
people  and  fairly  liberal  welfare. 
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Table  4 

SUBURBS  AND  GHETTO  CONTRASTS 


Loadings  on  Constituent  Variables:     Factor  Three 


Number3 

Name 

Loadings 

30 

Income  Suburbs/Central  City 

80 

33 

Rent  Central  City/Suburbs 

-72 

32 

Crowded  Housing  Central  City/ Suburbs 

70 

29 

Segregated  Suburbs 

69 

37 

Black,  Spanish  in  Central  City 

59 

43 

Migration  into  Central  City-Suburbs 

-57 

18 

Median  Income 

56 

40 

Robbery 

55 

1 

SMSA  Population 

53 

12 

Population  in  2000 

39 

Areas  with  Great 


Differences  Between 

Income  Suburbs/ 

City  and  Suburbs*5 

Scores 

Income  Central  City 

Washington,  D.C. 

3.3 

1.35 

Newark,  N.J. 

2.4 

1.53 

Atlanta,  Ga. 

2.3 

1.27 

Detroit,  Mich. 

2.0 

1.21 

Baltimore,  Md. 

1.9 

1.20 

Wilmington,  Del. 

1.8 

1.33 

Cleveland,  Ohio 

1.8 

1.25 

Areas  with  Undifferentiated 

c 

Suburbs 

Corpus  Christi,  Texas 

-2.2 

.95 

Tulsa,  Oklahoma 

-2.1 

.94 

Appleton,  Wise. 

-1.8 

.94 

Duluth,  Minn. 

-1.7 

.96 

Wichita,  Kansas 

-1.7 

.99 

dumber  on  list  in  Appendix  A. 


These  areas  have  the  lowest  factor  scores. 
These  areas  have  the  highest  factor  scores. 
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Table  5 
GROWTH 


Loadings  on  Constituent  Variables:     Factor  Foui 


Number3 

Name 

Load  ings 

4 

Change  SMSA  Population 

82 

13 

Years  to  1/2  Size  (age) 

-76 

26 

Central  City  Growth  50-60 

74 

9 

Climate 

65 

2  2 

Rent 

64 

21 

Black  with  both  Parents 

63 

38 

Spanish 

59 

15 

Lacking  Plumbing 

-50 

6 

Gain  Value  Added  (ratio) 

43 

41 

Burglary 

43 

Areas  of  ^ 

SMSA  Population  Growth 

Rapid  Growth 

Scores 

60-70  (%) 

Fort  Lauderdale,  Fla. 

3.4 

85.7 

Anaheim,  Calif. 

3.2 

101.8 

San  Jose,  Calif. 

3.1 

65.8 

Oxnard ,  Calif. 

2.9 

89.0 

Santa  Barbara,  Calif. 

2.1 

41.2 

Las  Vegas,  Nev. 

2.0 

115.2 

Miami,  Fla. 

1.7 

35.6 

San  Bernardino,  Calif. 

1.7 

41.2 

c 

Declining  Areas 

Wilkes-Barre ,  Pa. 

-2.1 

-1.3 

Duluth,  Minn. 

-1.8 

-4.1 

Charleston,  S.C. 

-1.7 

19.4 

Johnstown,  Pa. 

-1.6 

-6.4 

Number  on  list  in  Appendix  A. 


These  areas  have  the  lowest  factor  scores. 
These  areas  have  the  highest  factor  scores. 
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Table  6 
STAGE  IN  LIFE  CYCLE 


Loadings  on  Constituent  Variables:     Factor  Five 


Number3  Name  Loadings 


7  Over  65  -84 

2  Birth  Rate  79 

31  City  Pop/SMSA  53 

16  Crowding  in  SMS  A  50 

50  No.  of  Govt.  Units  -39 


Young  Family  Median  Age  in 

Areas*5  Scores    Central  City 


El  Paso,  Texas 

3.3 

23.2 

Honolulu,  Hawaii 

2.2 

28.1 

Newport  News,  Va. 

1.7 

24.2 

Flint,  Mich. 

1.7 

25.2 

Q 

Areas  with  Many  Old  People 

Fort  Lauderdale,  Fla. 

-3.3 

39.2 

Tampa,  Fla. 

-2.5 

37.8 

Wilkes-Barre ,  Pa. 

-2.3 

37.6 

West  Palm  Beach,  Fla. 

-2.3 

39.3 

Miami,  Fla. 

-1.8 

37.3 

Number  on  list  in 

Appendix  A 

b^ 

These  areas  have 

the  lowest 

factor  scores. 

c 

These  areas  have 

the  highest 

factor  scores. 
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Table  7 
WELFARE  AND  UNEMPLOYMENT 


Loadings  on  Constituent  Variables:     Factor  Six 


Number3  Name  Loadings 


23  Unemployment  71 

20  Welfare  70 

51  Local  Expenditures  56 

38  Spanish  SMSA  51 


Areas  with    ^  Unemployment  % 

Much  Welfare  Scores  1970 


Stockton,  Calif. 

3.2 

8. 

2 

Fresno,  Calif. 

3.2 

6. 

5 

Bakersf ield,  Calif. 

3.1 

6. 

0 

Los  Angeles,  Calif. 

2.3 

5. 

8 

San  Bernardino,  Calif. 

2.2 

5. 

9 

c 

Areas  with  Low  Welfare 

Fort  Lauderdale,  Fla. 

-2.1 

2, 

.6 

Madison,  Wise. 

-1.9 

3, 

.1 

Appleton,  Wise. 

-1.5 

4, 

.2 

Lancaster,  Pa. 

-1.5 

2, 

.3 

Number  on  list  in  Appendix  A. 


These  areas  have  the  lowest  factor  scores. 
These  areas  have  the  highest  factor  scores. 
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DENSITY 

The  densest  cities  are  the  old  cities  in  the  Northeast.  If  we  had  collected  data 
on  the  percent  of  foreign  born  and  percent  using  public  transportation,  those  varia- 
bles would  also  load  heavily  on  this  factor. 


EDUCATION 

This  is  often  used  as  a  measure  of  social  status,  complementary  to  money.  Since 
education  is  probably  the  most  important  determinant  of  health  (income,  for  exam- 
ple, statistically  has  a  negative  effect  on  health,  when  education  is  controlled  for),10 
both  of  the  health  variables  are  included  here.  Suburbs  generally  show  higher  on 
this  factor  than  they  do  on  income  differences,  since  the  three  main  low-education 
groups — blacks,  Spanish,  and  ethnics — tend  to  be  in  the  central  city. 

10  Michael  Grossman,  "The  Demand  for  Health,"  Occasional  Paper  119,  NBER,  Columbia  Press,  New 
York,  1972,  Chapter  VI.  His  explanation  is  that  poor  health  may  be  the  result  of  such  typical  attributes 
of  higher-income  life  as  anxiety,  alcohol,  and  cigarettes. 
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Table  8 
DENSITY 


Loadings  on  Constituent  Variables:     Factor  Seven 

Number  Name 

Loadings 

27          Density  SMSA 

91 

60          Density  Central  City 

66 

1  Population 

51 

11          One  Unit  Housing 

-50 

40          Robbery  rates 

49 

b 

SMSA  Population  per 

Dense  Areas 

Scores 

Sq.  iii. 

New  York,  N.Y. 

4.4 

18,500 

Jersey  City,  N.J. 

3.0 

13,700 

Chicago,  111. 

2.3 

7,850 

Indianapolis,  Ind. 

2.1 

8,090 

Memphis,  Tenn. 

1.5 

3,940 

c 

Areas  of  Low  Density 

Oklahoma  City,  Okla. 

-2.1 

760 

Augusta,  Ga. 

-2.0 

1,040 

Mobile,  Ala. 

-1.9 

660 

Greenville,  S.C. 

-1.8 

660 

Wilmington,  Del. 

-1.8 

1,300 

Salt  Lake  City,  Utah 

-1.6 

1,250 

Number  on  list  in  Appendix  A. 


These  areas  have  the  lowest  factor  scores. 
These  areas  have  the  highest  factor  scores. 
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Table  9 
EDUCATION 


Loadings  on  Constituent  Variables:     Factor  Eight 

Number3  Name 

Load  ings 

17          High  School  Education 

74 

19          College  Education 

59 

48  Dentists 

54 

22  Rent 

44 

44          Infant  Mortality 

-31 

Areas  with 

i'lost  Educated  % 

High  School 

Population*3  Scores 

Graduates 

Seattle,  Wash.  2.9 

67.8 

Minneapolis,  Minn.  2.3 

66. 1 

Honolulu,  Hawaii  2.3 

66.0 

Portland,  Ore.  2.2 

62.9 

Salt  Lake  City,   Utah  2.1 

68.5 

Anaheim,  Calif.  1.6 

70.5 

Boston,  Mass.  1.6 

64.4 

c 

Areas  with  Least  Educated  Population 

Jersey  City,  N.J.  -2.6 

36.3 

Gary,   Ind.  -2.4 

50.0 

El  Paso,  Texas  -2.1 

51.1 

Johnstown,  Pa.  -1.7 

44.1 

Birmingham,  Ala.  -1.6 

45.4 

Greensboro,  N.C.  -1.6 

42.4 

Number  on  list  in  Appendix  A 

^These  areas  have  the  lowest 

factor  scores. 

c 

These  areas  have  the  highest 

factor  scores. 

V.   CLUSTER  ANALYSIS  RESULTS 


The  eight  factor  scores  give  a  profile  of  each  city.  Although  these  scores  locate 
the  city  with  respect  to  the  national  average,  further  insights  can  come  from  dis- 
tributing the  cities  into  relatively  homogeneous  groups.  To  judge  how  representative 
a  city  is,  we  must  know  whether  there  are  many  cities  like  it,  or  whether  its  profile 
is  unusual.  This  information  will  be  useful  in  selecting  cities  and  in  weighing  contra- 
dictory findings  from  different  cities.  In  addition,  the  clusters  give  information  on 
how  factors  are  interrelated.  Although  they  are  constrained  mathematically  to  have 
zero  correlation  overall,  interesting  combinative  effects  appear  in  the  city  sub- 
groups. As  we  shall  see,  these  effects  are  generally  regional — the  regions  share  a 
climate,  history,  and  economic  development  that  is  reflected  in  the  data. 

The  cities  are  divided  into  homogeneous  groups  as  follows:  Each  city  is  repre- 
sented as  a  point  in  space,  with  its  eight  factor  scores  as  coordinates.  The  distances 
between  cities  have  been  subjectively  weighted  so  that  differences  in  more  important 
factors  have  more  effect  than  differences  in  the  other  factors.11  The  factor  weights 
are  given  underneath  the  factor  names  in  Table  10.  A  program  divides  the  cities  into 
clusters  so  that  a  weighted  sum  of  distances  from  each  city  to  the  center  of  its  cluster 
is  minimized.  We  present  two  types  of  results — the  best  allocation  to  ten  clusters  and 
a  tree  of  clusters  formed  by  joining  the  results  for  different  numbers  of  clusters. 


AMERICAN  CITIES  IN  TEN  CLUSTERS 

Table  10  gives  the  best  ten-cluster  results.  The  top  row  can  be  interpreted  as 
follows:  The  398.5  is  the  total  weighted  squared  distance  from  cities  to  the  center 
of  their  clusters.  The  1.0,  1.5,  and  so  forth  are  the  subjective  weights  assigned  to  the 
factor.  The  next  line  shows  that  there  are  16  cities  in  the  first  cluster,  that  their 
combined  weighted  squared  distance  is  48.95,  and  that  their  mean  score  on  the 
inequality  factor  is  1.27  standard  deviations  above  average,  on  the  ghetto-suburb 
factor  their  mean  is  .17  and  so  forth.  The  next  line  shows  that  Columbia,  S.C.,  is 
closest  to  the  mean  for  the  cluster  (only  .73  away)  and  that  its  score  on  nonmanufac- 

1 '  For  discussion  of  how  the  factor  weights  are  determined,  and  of  the  clustering  algorithm,  see 
Appendix  B. 
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0.5 

Toledo 

OH 

0.7 

+ 

Akron 

OH 

0.8 

Fort  Wayne 

IN 

0.8 

+ 

Grand  Rapids 

MI 

1.0 

Lansing 

MI 

1.0 

+ 

Bridgeport 

CT 

1.2 

+ 

Davenport 

IA 

1.2 

Canton 
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2.9 

+ 

+ 

+ 

Orlando 

FL 

3.4 

+ 

West  Palm  Beach 

FL 
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VA 

*K  7 
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Average 
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.4 
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PA 
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- 

Buffalo 

NY 

0.8 
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NY 

0.9 

_ 
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WI 

1.1 
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MA 

1.2 

Providence 
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Cincinnati 

OH 

1.3 

- 

New  Haven 

CT 

1.4 

Philadelphia 

PA 

1.7 

+ 

+ 
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OH 

1.8 

+ 

Louisville 

KY 

2.1 

+ 

+ 
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MO 

2.2 

+ 

Boston 

MA 

2.3 
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++ 

Minneapolis 

MN 

2.6 

++ 

Denver 

CO 

2.6 

+ 

+ 
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Harrisburg 

PA 

3.1 

+ 
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CA 

3.6 

+ 

++ 

+ 
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OR 

3.7 

+ 

++ 

Jersey  City 

NJ 

11.5 

++ 

New  York 

NY 

12.6 

++ 

+ 

++ 

3The  symbols  in 

the  table  show  the 

relative 

position  of  the  cities  in 

the 

cluster : 

.6  to  1 

.19  standard 

deviations  below  the 

cluster 

average 

1.2  or 

more  standard 

deviations  below  the 

cluster 

average 

+ 

.6  to  1 

.19  standard 

deviations  above  the 

cluster 

average 

++ 

1.2  or 

more  standard 

deviations  above  the 

cluster 

average . 

^Given  in 

standard  deviations 

from 

national 

average . 
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turing  is  (.23  +  .03)  =  .26.  The  numbers  are  relative  to  the  mean  for  the  cluster. 
Thus  Columbia's  score  on  the  inequality  factor  is  1.54. 

At  the  bottom  of  the  list,  the  cities  may  be  less  representative  of  their  cluster 
or,  as  is  Fort  Lauderdale  in  cluster  5,  extreme  cases  of  what  the  cluster  represents. 
By  looking  at  the  residuals,  we  get  an  idea  of  how  such  cities  differ  from  the  main 
body  of  the  cluster.12  Thus,  in  cluster  10,  New  York  and  Jersey  City  are  much  more 
dense  than  the  rest  of  these  dense  cities.  If  more  clusters  were  allowed  they  would 
break  off  into  their  own  two-city  cluster.  As  it  is,  every  city,  no  matter  how  special, 
must  go  somewhere. 

We  shall  discuss  the  satisfactoriness  of  the  clustering  after  we  describe  the  ten 
clusters.  Cluster  1  and  cluster  4  are  southern  cities.  Those  in  cluster  1  are  older  and 
less  dense  with  differentiated  suburbs.  Cluster  2  contains  the  less  prosperous  Cali- 
fornia cities.  They  are  very  high  on  unemployment  and  welfare.  Their  growth  rate 
is  above  average,  but  it  is  nowhere  near  as  fast  as  cities  in  cluster  8,  the  California 
boom  towns.  Many  of  these  cluster  2  cities  are  farm  centers,  with  less  education  and 
manufacturing  than  cities  in  cluster  8.  Cluster  3  contains  the  manufacturing  cities 
of  the  midwest.  In  cluster  5  are  the  Texas  and  Florida  growth  cities.  They  are  poor 
and  black  and  have  older  residents  than  the  average  city.  Cluster  6  consists  of  the 
declining,  white,  smaller  Northeastern  cities.  In  cluster  7  are  the  big  cities  with 
large  black  ghettos  ringed  by  prosperous  white  suburbs.  The  other  big  cities,  in 
cluster  10,  are  somewhat  denser,  and  have  suburbs  that  are  either  smaller  or  more 
like  the  central  cities.  Cluster  9  is  made  up  of  the  nonmanufacturing  cities.  They 
are  young,  growing,  and  mainly  in  the  southwest. 

The  ten-cluster  solution  presented  here  was  the  most  satisfactory  one  produced. 
It  contains  few  incongruities  as  seen  by  urban  experts.  Some  incongruities  are 
unavoidable  since  there  are  cities  that  are  unique  in  ways  in  which  we  have  not  been 
able  to  collect  data.  We  would  hope  that  many  of  these  would  be  on  the  edges  of  the 
clusters.  The  clusters  are  in  no  sense  uniquely  determined;  many  transitions  are 
possible  with  little  effect  on  the  total  score.  For  example,  Harrisburg  is  in  most  ways 
more  like  the  cities  in  cluster  5  than  those  in  cluster  9;  the  main  difference  is  in 
suburb  differentiation.  In  different  runs,  the  types  of  clusters  generally  stay  the 
same,  but  a  few  cities  in  the  clusters  may  change. 

One  problem  with  the  factor-cluster  method  is  that  it  is  an  average  classifica- 
tion.13 Although  it  may  have  some  relevance  to  many  urban  problems,  it  does  not 
classify  areas  exactly  according  to  any  one  problem.  This  is  what  we  want  in  our 
selection  of  representatives;  but  if  we  were  trying  to  study  one  particular  problem, 
we  might  return  to  the  data  file  to  select  cities  precisely  on  the  variables  affecting 
that  problem. 


A  TREE  OF  CLUSTERS 

The  clustering  analysis  was  repeated  with  4,  6, 8, 10,  and  14  clusters.  The  result 
is  the  somewhat  hierarchical  clustering  shown  in  Figure  2.  Each  cluster  is  identified 

12  We  have  computed  scatter  plots  of  the  cities  on  pairs  of  factors,  which  show  more  clearly  the 
relation  of  cities  to  clusters. 

13  C.  A.  Moser  and  W.  Scott,  British  Towns,  Oliver  and  Boyd,  Edinburgh,  1961. 
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by  its  salient  characteristic,  its  most  representative  city,  and  the  number  of  cities 
in  the  cluster.  It  is  not  strictly  hierarchical,  since  the  new,  more  specialized  clusters 
that  form  in  lower  levels  may  pick  up  cities  from  other  more  general  clusters.  For 
example,  when  the  big  cities  with  ghettos  on  the  #10  line  split  into  northern 
manufacturing  and  southern  nonmanufacturing  cities,  they  add  Akron,  Bridgeport, 
and  Dayton  from  the  manufacturing  cluster,  and  Norfolk  from  the  nonmanufactur- 
ing cluster.  The  tree  clearly  brings  out  the  regional  basis  of  city  differentiation.  The 
cities  are  clustered  in  terms  of  manufacturing  or  nonmanufacturing,  inequality  or 
affluence,  young  or  old,  big  or  small;  but  most  are  just  as  accurately  described  as 
Florida,  California,  or  New  England  clusters. 


VI.  CONCLUSIONS 


We  have  tried  to  make  a  simple  characterization  of  the  125  U.S.  cities  with 
populations  greater  than  250,000.  The  first  step  was  to  determine  a  set  of  variables 
that  measures  their  diversity;  we  focused  on  a  total  of  53,  covering  growth  and 
decline  of  areas,  social  and  racial  inequality,  government  and  political  structure, 
and  city-suburban  relationships.  Clearly,  any  such  set  of  variables  contains  a  good 
deal  of  redundancy,  so  the  next  ste^  was  to  eliminate  it.  We  used  factor  analysis  to 
find  eight  variables  that  were  linear  combinations  of  the  original  53  and  account  in 
some  sense  for  most  of  their  variability.  Thus,  each  city  was  represented  as  a  point 
in  eight-dimensional  Euclidean  space.  To  determine  which  of  these  points  are  close, 
we  applied  a  clustering  algorithm  in  generating  ten  fairly  homogeneous  groups. 
These  groups  are  mainly  regional,  since  the  common  history  and  economic  develop- 
ment of  regions  are  reflected  in  the  data. 

Alford  argues  that  a  major  problem  of  classifications  is  not  that  they  are  useless, 
but  in  practice  no  one  uses  them.14  Our  work  has  already  been  used  in  the  selection 
of  these  future  sites  for  study:  Little  Rock,  Cincinnati,  and  Philadelphia.  We  hope 
that  a  variety  of  useful  models,  such  as  the  one  presented  in  Appendix  D,  can  be 
developed  and  tested.  It  seems  to  be  a  very  cost-effective  investigatory  procedure. 

14  R.  Alford,  "Critical  Evaluation  of  the  Principles  of  City  Classification,"  in  B.  Berry  (ed.),  City 
Classification  Handbook. 
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Appendix  A 
DATA  SOURCES  AND  REMARKS 


Table  A-l 
LIST  OF  VARIABLES3 


Number 
of 

Variable 

Description  of  Variable 

Source/Table 

1 

Population  (log  people) 

SA 

3 

2 

Birth  rate,  1968  (per  1000  population) 

VS 

3 

Construction  employment  (%  non-agr.  empl.)  , 

SA 

71 

4 

Population  change,  1960-70  (log  (70  pop/60)) 

SA 

5 

5 

New  capital  expenditures  1963  (log  $  per  capita) 

CCD  3 

70 

6 

Gain  in  value  added  (ratio  1967/1963) 

SA 

192, 

197 

7 

Over  65  years  old  (%  pop) 

SA 

20, 

43 

8 

Gini  coefficient  (a  large  value  for  inequality)  . 

b 

use 

89 

9 

Climate  index  (subjective  desirability  regressed  on  temp,  and  rain) 

CCD4 

341 

10 

Area  of  central  city  (log  sq  miles) 

SA 

11 

11 

One  unit  structures  (%  year  round  units) 

SA 

96 

12 

Population  of  year  2000  Metropolis  (log  people)^ 

13 

Years  since  city  was  half  of  present  size  (log) 

14 

Manufacturing  ratio  (%  non-agr.  empl.) 

SA 

67 

15 

Lacking  plumbing,  housing  (%  occupied  units) 

SA 

91 

16 

Crowded  housing  (X  households) 

SA 

92 

17 

High  school  graduates  (Z  pop) 

use 

83 

18 

Family  median  income  ($1000) 

use 

89 

19 

College  graduates  (%  pop) 

use 

83 

20 

Welfare,  1971  (AFDC  as  %  pop) 

SA 

165 

21 

Black  children  living  with  both  parents  (%  black  kids) 

use 

90 

22 

Rent  (monthly  median  in  $) 

SA 

108, 

118 

23 

Unemployment  (%  work  force) 

SA 

80 

24 

Unemployment  increase  1969-70  (%  work  force) 

SA 

78 

25 

Gain  in  value  added  per  capita  (1967/1963  in  $1000)  , 

SA 

192, 

197 

26 

Population  change  1950-60  in  central  cities  (log  (60  pop/50)) 

SA 

14 

27 

Density  of  population  (log  people  per  sq  mile) 

SA 

11, 

33 

28 

Black  owner  occupied  housing  (%  total  occupied  units)  ^ 

SA 

94 

29 

Segregation  of  suburbs  from  city  (%  complete  segregation) 

SA 

36, 

40 

30 

Income  ratio,  suburbs /central  city  (median  family  incomes) 

use 

89 

31 

City  population  (%  SMSA  pop)b 

32 

Crowding  (X  central  cities-%  suburbs) 

SA 

92, 

103 

33 

Rent  (monthly  median,  central  cities  -  suburbs) 

SA 

108, 

118 

34 

Black  family  median  income  ($1000) 

use 

94 

35 

Income  (white/black,  family  median)  ^ 

use 

89, 

94 

36 

Nonwhite  infant  mortality  -  white  (Z  of  births,  1967) 

VS 

37 

Black  and  Spanish  in  main  city  (%  of  pop) 

use 

81 

38 

Spanish  (%  of  pop) 

use 

81 

39 

Vacancy  rate  in  central  cities  housing  (%  units) 

SA 

99, 

100 

40 

Robbery  per  capita  (per  1000) 

SA 

180 

41 

Burglary  per  capita  (per  1000) 

SA 

182 

29 
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Table  A-l  (Cont.) 
LIST  OF  VARIABLES 


Number 
of 


Variable 

Description  of  Variable 

Source/Table 

42 

0E0  antipoverty  funds  allocated  ($  per  capita)  ^ 

SA 

160 

43 

Relative  migration,  1960-70  (main  city  rate  -  suburb) 

44 

Infant  mortality,  1967  (infant  deaths  per  births) 

VS 

45 

Cost  of  living,  1969  (expenses  for  intermediate  family  of  4/$9000) 

46 

Income  growth,  1959-69  (annual  rate  of  per  capita  income  growth)  ^ 
Tauber's  segregation  index,  main  city  1960  (computed  block  by  block) 

SA 

5,  64 

47 

48 

Dentists  (//  per  100,000  pop.  in  1969) 

SA 

58 

49 

Type  of  government  (weak  mayor«0,  stronger»l . 2 ,  comm. =3,  manager~4) 

MY 

50 

#  governmental  units,  1967  (normalized  by  pop,  log)'1 

SA 

140 

51 

Local  direct  general  expenditure  (log  $  per  capita) 

SA 

146 

52 

Federal  government  employees,  1969  (%  pop) 

SA 

152 

53 

LBJ  1964  vote  (%  total  presidential  vote)b 

SA 

133 

Unless  stated  otherwise,  data  are  for  the  SMSA  in  1970. 


SOURCES:    MY  =  Municipal  Yearbook,  ICMA,  Chicago,  1972.     SA  =  Statistical  Abstract,  1971, 
pp.  830-889.    CCD3  -  City  and  County  Data  Book,  1967,  Section  3.    CCD4  -  City  and  County  Data 
Book,   1967,  Section  4.     USC  =  Census  of  Population  of  General  Social  and  Economic  Character- 
istics, 1970.    VS  -  Vital  Statistics  of  the  U.S.,  1967. 

''variable 

4.     Based  on  Area  of  1970  SMSA. 

9.     Ten  people  were  asked  to  rank  ten  major  cities  on  a  scale  of  1-5  for  desirability  of 
climate.     The  average  ranking  was  regressed  against  summer  and  winter  mean  noon  temperature  and 
annual  rainfall.     Desirability  =  .077  Winter  -  .030  Summer  -  .035  Rainfall. 

12.  Jerome  P.  Pickard,  U.S.  Metropolitan  Growth  and  Expansion,   1970-2000  with  Population 
Projections ,  Urban  Land  Institute,  Washington,  1971,  Tables  III-6  through  III-8,  with  the  low 
census-E  projections  of  birth  rate.     There  are  three  Megapolitan  areas — Atlantic  Seaboard,  Lower 
Great  Lakes,  and  California — and  other  smaller  areas. 

13.  Generated  at  Rand  from  the  1950  Statistical  Abstract,   the  Encyclopedia  Britannica  and 
some  guesswork. 

26.     Data  based  on  1970  areas  of  cities. 

29.     Percent  of  whites  who  must  move  to  make  percent  whites  equal  in  city  and  suburb,  as 
a  fraction  of  the  percent  who  must  move  to  integrate  the  city  if  segregation  were  total. 
31.     1971  Statistical  Abstract,  p.  21. 

36.     Since  some  cities  have  very  small  nonwhite  populations,  an  experimental  Bayes  tech- 
nique was  used.     The  corrected  infant  mortality  rate  was  (Nonwhite  Infant  Deaths  in  1967  +9)/ 
(Nonwhite  Births  in  1967  +  250).     It  is  essential  that  9/250  =  .036,  the  national  nonwhite  ratio 
of  infant  deaths  to  births. 

43.     Data  from  the  working  file  of  P.  A.  Morrison,  Rand. 

45.    The  1970  Statistical  Abstract,  p.  346,  gave  the  1967  estimated  costs  of  living  for 
an  urban  family  of  four  in  34  of  our  125  metropolitan  areas.     For  the  other  91,  we  used  the 
regressed  estimate,  Cost  of  Living  «=  6126  +  .536  (Per  Capita  Income)  +  27.3  (Latitude)  -  7.18 
(X  Minority)  -  309  (if  in  South).    This  had  an  R2  of  .76  for  the  34  cities. 

47.    Taken  from  K.  E.  Taeuber  and  A.  F.  Taeuber,  Negroes  in  Cities,  Aldine,  Chicago,  1965, 
p.  32.    The  index  computes  segregation  in  1960  as  in  variable  29,  but  on  the  basis  of  census 
blocks,  rather  than  just  city  and  suburbs. 

50.    Normalized  by  dividing  by  the  square  root  of  population.    This  is  supposed  to  allow 
for  a  naturally  greater  number  of  governments  where  there  is  a  greater  number  of  communities 
coming  together. 

53.     A  measure  of  conservatism.     In  this  election,  there  were  few  minor  party  votes. 
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The  diversity  of  American  urban  areas  is  shown  in  Tables  A-2  through  A-10, 
tables  of  empirical  variables.  Five  evenly  spaced  points  on  the  distribution  of  125 
areas  are  given  for  each  variable:  The  lowest,  32nd  lowest  (25  percent),  63rd  lowest 
(median),  94th  lowest  (75  percent),  and  highest  values  of  the  variable.  The  city  with 
that  particular  value  is  also  shown.  These  order  statistics  are  preferred  to  the  mean 
and  standard  deviation  because  of  their  insensitivity  to  scaling  and  extreme  values. 
The  numbers  speak  for  themselves. 

Table  A-ll  shows  the  skewness  and  outliers  in  the  data.  In  this  table  +  ,  +  +  , 
and  +  +  +  represent  high  outliers  of  the  variables  more  than  one,  two,  or  three 

standard  deviations  above  average;  — ,  ,  and  indicate  low  outliers.  It  is 

apparent  from  the  table  that  certain  cities,  such  as  Fort  Lauderdale  or  New  York, 
are  extremely  different  from  the  average  on  many  variables. 

Table  A-12  gives  the  correlation  coefficients  between  each  pair  of  variables. 
These  coefficients  measure  the  observed  simple  linear  relationships  in  the  data. 
These  relations  may  be  accidental,  or  they  may  reflect  the  implications  of  true  cause 
and  effect.  A  positive  coefficient  means  that  variables  are  directly  related,  and  a 
negative  value  indicates  an  inverse  relationship.  The  closer  the  correlation  is  to  + 1, 
the  closer  one  variable  is  to  being  a  linear  transformation  of  the  other.  To  interpret 
such  relationships,  we  must  control  for  the  influence  of  other  variables,  as  our  later 
multivariate  analysis  does.  It  should  be  noted  that  the  coefficients  measure  only  the 
degree  of  linear  relationship;  significant  nonlinear  relationships  may  exist  but  still 
yield  small  coefficients. 
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Table  A-2 
URBAN  GROWTH  AND  DECLINE' 


Variable 
Name 


Variable  Lowest 
Number  Value 


25  75  Highest 

Percent     Median    Percent  Value 


Construe  tion 
Employment 
(percent  non- 
farm  employment) 


2.7 
JC 


4.1 

SBD 


4.8 
SF 


5.8 
DLS 


13.6 
FL 


Population 
Change 
1960-70  (%) 


-6.4 
JHN 


9.5 
SPD 


16.1 

GRO 


24.5 
SLC 


115.2 
LV 


New  Capital 
Expenditures 
(1963  $  per 
capita) 


8 
ASN 


35 
SLC 


60 
CNI 


79 
DYN 


416 
HNN 


Ratio  Value 

Added 

1967/1963 


.98 
LRN 


1.27 
CLD 


1.37 
KC 


1.45 
PRD 


2.51 
FL 


Gain  in  Value 
Added  per 
capita 

($  1967/1963) 


25 


-184 
LRN 


160 
BLE 


311 
STN 


449 
TLA 


1,447 
BMT 


Population 
Change  in 
Central  Cities 
1950-60  (%) 


26 


-15 
WB 


-2 
SYE 


14 
SBD 


41 

ASN 


368 
TCN 


See  Table  A-ll  for  city  abbreviations. 


Table  A-3 
INCOME  AND  EDUCATION 


Variable  Variable     Lowest        25  75  Highest 

Name  Number      Value      Percent    Median    Percent  Value 


Gini  Coefficient 
(Families,  SMSA) 

Family  Median 
Income3 


18 


.286 
LRN 

8,035 
ALN 


.319 
ANM 


.337 
SL 


.361 
LA 


.424 
WPB 


9,585      10,262     10,749  12,556 
SLS  CNN  LV  WSN 


Per  capita 
Income  Growth 
(1959-60,  %) 


46 


26.5 
OXD 


61 
FRO 


68 
SPE 


75 
MBE 


119 
AGA 


Cost  of  Living 

(Family  of  45           7,820  8,440  8,960  9,180  10,330 

four,  $)                                       CC             SB             YNN  TRN  HNU 

High  School  17           36.3  50.7  54.2  60.1  71.3 

Graduate  (%)                               JC  KNE            SBD  MBE  SBA 

College  19           3.3  9.3  10.8  13.2  23.4 

Graduates                                    MBE            PRA           BRT  PTN  WSN 


Adjusted  by  cost  of  living. 
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Table  A-4 
POVERTY:     HOUSING  AND  UNEMPLOYMENT 


Variable 
Name 


Variable     Lowest        25  75 
Number      Value      Percent     Median  Percent 


Highest 
Value 


Lacking  Plumbing  15  .6  2.4  3.1  4.2  11.8 

Housing  (%)  ANM  LRN  MLE  UTA  CHN 

Crowded  Housing  16  3.4  6.1  7.1  9.2  19.9 

(%  Households)  RDG  ANM  FL  CLA  HNU 

Rent3  22  53  78  92  107  142 

(Monthly  Median)  JHN  BFO  ALE  FLT  SJ 

Unemployment  23  2.1  3.5  4.2  5.4  9.5 

(%  1970)  RCD  OC  DLH  BFO  SEE 


Unemployment 

Increase 

1969-70 


24 


0 

DM 


.7 
OMA 


1.1 

BSN 


1.3 
CLD 


5.5 
SEE 


Welfare  (AFDC  as 
%  Population 
1970) 


20 


1 

APN 


3.5 
ASN 


5.5 
HNU 


6.5 
CLD 


14 
BNN 


Adjusted  for  cost  of  living. 


Table  A-5 
RACE  AND  RACIAL  DIFFERENCES 


Variable  Variable     Lowest        25  75  Highest 

Name  Number      Value      Percent     Median    Percent  Value 


%  Black  Children 
Living  with  Both 
Parents 


21 


29 
WB 


55 
MDN 


57 
NO 


60 
UTA 


81 
HNU 


Black  Owner 
Occupied 

Housing  (%  All  28              0              1  3  7  16 

Housing)                                         APN          HNN  HRG  CLD  JCN 

Black  Family  34         4,635  6,177  6,779  7,329  10,574 

Median  Income                                SHT          SA  RDG  SBD  SJ 


Black,  Spanish  37 
in  Main  City  (%) 

Spanish  in  38 
SMSA  (%) 

Median  Income/ 

Black  Median  35 

(Families) 


1 

APN 
0 

ALN 


1.03 
BNN 


14 
YRK 

1 

NN 


1.43 
PHA 


25 
LR 

1 

SPD 


1.51 
DVT 


35 
HRD 

6 

DM 


1.61 
SYE 


73 
WSN 

25 
MMI 


1.94 
SAT 


Adjusted  Nonwhite 
Infant  Mortality 
minus  White 
(%  Births) 


36 


-.3 
CC 


1.1 
LV 


1.5 
DYN 


2.0 
RDG 


3.1 
BNN 
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Table  A-6 
SUBURBAN  AND  CITY  CONTRASTS 


Variable 
Name 

Variable 
Number 

Lowe 8 t 
Value 

25 
Percent 

Median 

75 

Percent 

Highest 
Value 

Income  Ratio 
Suburb/Citya 

30 

.891 
BKD 

1.008 
DVT 

1.063 
GR 

1.142 
RDG 

1.532 
NWK 

Monthly  Rent 

Suburb-City 

($) 

33 

-30 
TLA 

-4 
JHN 

8 

CNN 

22 
BLE 

54 
DTT 

Crowded  Housing 
%  City-Suburb 

32 

-2 
ALE 

-.3 
RDG 

0 

WB 

.3 
JC 

3.5 
NWK 

City  Population 
(%  SMSA 
Population) 

31 

9 

SB 

27 
WSN 

38 
DLH 

51 
HNU 

100 
JCE 

Taeuber's  Index 
of  Segregation 
1960 

(100=complete) 

47 

72 
SJ 

80 
TCA 

87 
ERE 

92 
FWE 

98 
FL 

Vacancy  Rate 
City  Housing 
(%  Units) 

39 

0 

JHN 

5 

LA 

6 

FRO 

7 

FW 

10 
FL 

Relative 

Migration  (Total 
City  Rates) 
(1960-70) 

43 

32 
AGA 

15 
BRM 

7 

SPE 

-7 
OXD 

-46 
KNE 

Jacksonville,  whose  central  city  is  its  SMSA,  is  not  considered 
to  have  suburbs  in  our  data.     It  has  been  given  the  average  ratio, 
where  appropriate. 
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Table  A-7 
GOVERNMENT  AND  POLITICS3 


Variable  Variable    Lowest        25  75  Highest 

Name  Number      Value      Percent    Median    Percent  Value 


Governmental 
Units  in  SMSA, 
1967  (Ranks 
adjusted  for 
population) 

50 

4 

HNU 

37 
CHN 

75 
WPB 

137 
WCA 

704 
PTG 

0E0  Anti-Poverty 
Funds  ($  per 
capita) 

42 

1.5 
ANM 

5.6 
OXD 

7.7 
TLO 

10.8 
TMA 

44 
JCN 

Local  Direct 
General  Expendi- 
ture ($  per 
capita) 

51 

91 
HNU 

203 
BR 

225 
NSE 

270 
TCN 

459 
STN 

LB  J  Presidential 
Vote,  1964  (%) 

53 

11 
JCN 

53 
CHE 

62 
SLS 

67 
FLT 

82 
PRE 

Federal 
Government 
Employees  (1969) 
(%  Population) 

52 

.3 
GRY 

.6 
ANM 

1.0 
CLD 

1.8 
PHA 

10.9 
WSN 

Two  other  variables  are  discrete:     29  Areas  are  considered  to 
have  "Congressional  Power,"  by  virtue  of  being  represented  by  a  congress- 
man or  senator  who  heads  a  major  committee;  43  of  the  central  cities  are 
led  by  a  manager,   15  are  governed  by  a  commission,  and  the  rest  have  mayors. 


Table  A-8 
URBAN  DEMOGRAPHY 


Variable 
Name 


Variable 
Number 


Lowest 
Value 


25 
Percent 


Median 


75 
Percent 


Highest 
Value 


Population , 
SMSA,  1970 


250,000     320,000     541,000     1,013,000  11,529,000 
SLS  LNR  NSE  TMA  NY 


Population 
Density  (SMSA) 
(People  per 
square  mile) 


27 


650 
DLH 


1,880 
DLS 


2,460 
SBD 


3,300 
PRA 


18,540 
NY 


Birth  Rate 
(1968  per  1000) 

Over  65 
(%) 


13.6 
FL 

7 

NN 


16.9 
NH 

8.5 
DYN 


18 
BKD 

9 

SBA 


19.1 

HSN 

10 
BFO 


28.9 
EP 

20 
TMA 
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Table  A-9 
URBAN  GEOGRAPHY 


Variable 
Name 


Var  iable 
Number 


Lowest 
Value 


25 
Percent 


Median 


75 

Percent 


Highest 
Value 


Climate 
Index 


-23 
DLH 


-14 

LNG 


-10 

MMS 


-1 
ACA 


24 
HNU 


Area  of 
Central  City 
(square  miles) 

One  Unit 
Structures 
(%  Units) 


Manufacturing 
Ratio  (%  non- 
farm  employees) 


10 


11 


Expected 

Population  of 

Year  2000  12 

Megalopolis 

Age  of  City 

(years  since 

one  half  present  13 

population) 


14 


4.5 
YRK 


13.5 
JC 


8 

SJ 


4 

WSN 


24 
KNE 


63.3 
MNS 


44 
SPD 


72.7 
SLC 


71 

TCN 


77.6 
APN 


769 
JCE 


85.4 
BMT 


220,000     3,600,000     35,000,000     52,000,000  57,000,000 
DLH  MNS  LA  CHO  NY 


26 
SPE 


18 
TMA 


50 
LR 


27 
LNG 


70 
DLH 


35 
UTA 


180 
CHN 


49 
RKD 


Table  A-10 
HEALTH  AND  CRIME 


Variable  Variable    Lowest        25  75  Highest 

Name  Number      Value      Percent     Median    Percent  Value 


Infant 

Mortality  1967 


Deaths/Births 

44 

1.2 

1.9 

2.1 

2.9 

3.0 

(%) 

YRK 

SF 

PTG 

MMI 

CLA 

Dentists  (per 

48 

326 

465 

536 

642 

957 

million,  1969) 

EP 

CNN 

HRG 

OMA 

NY 

Robbery  Rate 

(per  1000  pop- 

40 

.1 

.8 

1.23 

1.83 

6.6: 

ulation,  1969) 

APN 

YRK 

CNN 

JC 

NY 

Burglary  Rate 

(per  1000  pop- 

41 

2.75 

8.6 

12.3 

15 

22. 

ulation,  1969) 

WB 

BFO 

NH 

BKD 

JCE 

0)  CO 


J3 
CO 
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Table  A-  13 
FACTOR  SCORES 

i 


CITIES 

j  - 

Inequality 

Ghetto- 
Suburbs 

* 
o 

O 

Stoge  in 
Life  Cycle 

Welfare 

Density 

5 

*- 

8 
-8 

8  i  t 

AKRON 

OH 

-0.9 

-0.7 

0.4 

-0. 

0 

0.2 

-0. 

3 

-0. 

1 

-0. 

1 

ALBANY 

NY 

0.7 

-1.0 

0.0 

-0. 

7 

-1.1 

-0. 

8 

-0. 

1 

0. 

4 

ALBUOUERwUE 

Nfl 

2.1 

-0.9 

-1.5 

0  . 

6 

1.4 

0. 

5 

0. 

3 

0. 

3 

ALLENT3JN 

PA 

0.3 

-1.5 

-1.2 

-0. 

7 

-1.4 

-1. 

2 

-0. 

2 

-1. 

1 

ANAHE  I 

CA 

-1.4 

0.3 

-0.1 

3. 

2 

-0.6 

0. 

0 

-0. 

2 

1. 

6 

APPLET  JN-USK.S 

MI 

-i.O 

-1.1 

-1.8 

0. 

2 

-0.2 

-1. 

5 

0. 

1 

0. 

1 

ATLANTA 

OA 

0.4 

0.8 

2.3 

0. 

3 

0.6 

-0. 

4 

-0. 

8 

0. 

6 

AJGJSTA 

GA 

-0.2 

1.5 

1.6 

-1. 

3 

0.7 

0. 

7 

-2. 

0 

-0. 

8 

AUSTIN 

TX 

1.5 

0.7 

-0.8 

0. 

9 

0.6 

-1. 

5 

1. 

0 

0. 

7 

BAKERSF.I  EL  J 

CA 

1.1 

0.2 

-0.8 

0. 

3 

-0.6 

3. 

1 

-1. 

6 

-0. 

3 

L ALT  I  MORE 

MO 

0.4 

-0.1 

1.9 

-0. 

3 

0.7 

0. 

1 

1. 

2 

-0. 

8 

oATON  ROUGE 

LA 

-J.  1 

1.7 

-0.2 

0. 

1 

1.3 

0. 

2 

-0. 

2 

0. 

7 

be AUMUNT 

TX 

-1.6 

1.6 

C.3 

0. 

3 

0.  1 

0. 

i- 

-1. 

5 

-0. 
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B  I  NGHAi-tTuN 

NY 

-0.8 

-2.1 

-0.3 

0. 

1 

-0.6 

0. 

1 

-0. 

9 

-0. 

6 

S  I  RrilNOHAii 

AL 

0.4 

0.9 

0.2 

-0. 

8 

-0.0 

-0. 

3 

-0. 

2 

-1. 

6 

bOSTON 

K  S 

0.3 

-0.1 

0.6 

-1. 

2 

-1.3 

0. 

3 

0. 

3 

1. 

6 

BRIDGEPORT 

CT 

-1.2 

-0.6 

0.5 

0. 

1 

-0.5 

-0. 

0 

0. 

3 

0. 

5 

EUFFALG 

NY 

-0.5 

-0.5 

0.3 

-1. 

0 

-0.3 

0. 

6 

0. 

9 

-0. 

2 

CANTON 

OH 

-1.2 

*1.0 

-0.1 

0. 

1 

-0.1 

-1 . 

1 

-0. 

2 

-0. 

9 

CHARLESTON 

sc 

0.9 

1.7 

0.4 

-1. 

7 

1.5 

0. 

3 

-0. 

4 

-0. 

3 

CHARLuTTE 

JC 

0.3 

1.0 

-0.1 

0. 

3 

0.7 

-1. 

1 

0. 

1 

-0. 

3 

CHATT ANOuGA 

TN 

-C.4 

1.5 

0.5 

-1. 

1 

-0.1 

0. 

3 

-0. 

8 

-0. 

6 

CHICAGO 

IL 

-0.4 

-0.0 

1.2 

-0. 

2 

0.1 

-0. 

2 

2. 

3 

0. 

1 

C  I  NCI  NI.AT  I 

OH 

-0.1 

-0.2 

0.6 

-0. 

6 

0.  0 

-0. 

4 

-0. 

1 

-0. 

6 

w  LtVE  LAND 

OH 

-0.4 

-0.2 

1.8 

-0. 

3 

-0.2 

-0. 

1 

0. 

3 

0. 

0 

C  JLUMiil  A 

SC 

0.3 

1.5 

-0.  1 

-0. 

3 

0.7 

-0. 

9 

-0. 

3 

-0. 

2 

COLUMBUS 

OH 

0.5 

-C  .8 

0.2 

-0. 

2 

0.7 

-0. 

8 

0. 

7 

0. 

4 

CORPUj  CHRiTI 

TX 

0.7 

O.o 

-2.2 

0. 

6 

1.5 

0. 

3 

1. 

2 

-1. 

3 

DALLAS 

TX 

-0.4 

1.2 

0.4 

1. 

1 

0.4 

-0. 

4 

0. 

0 

0. 

7 

DAVENPURT-KI+ 

I  A 

-C.7 

-0.5 

-0.8 

3. 

1 

0.1 

-0. 

3 

-0. 

7 

-0. 

1 

DAYTON 

OH 

-0.9 

-C  .6 

1 .  3 

0. 

3 

0.8 

-0. 

7 

-0. 

2 

-0. 

3 

DENVER 

CJ 

1.0 

-0  .  6 

0.4 

0. 

1 

0.1 

-0. 

1 

0. 

1 

1. 

2 

DES  MOINES 

I  A 

1.0 

-0.5 

-0.3 

-0. 

5 

0.  1 

-1. 

0 

-0. 

2 

1. 

1 

DETROIT 

MI 

-  .e 

0.4 

2.3 

-0. 

1 

1.1 

1. 

3 

1. 

3 

0. 

1 

DOLUTH 

M  N 

1.4 

-1.4 

-1.7 

-1. 

8 

-0.4 

-0. 

2 

-1. 

3 

0. 

8 

L L  PASO 

TX 

1.2 

-1.0 

-1.3 

1. 

0 

3.3 

0. 

1 

0. 

4 

-2. 

1 

ERIE 

PA 

-0.5 

-0.7 

-0.8 

-0. 

8 

-0.3 

-0. 

8 

0. 

2 

-0. 

2 

FLINT 

MI 

-3.0 

-0.1 

0.5 

0. 

3 

1.7 

1. 

2 

-0. 

2 

-0. 

6 

FORT  LAUORJLE 

FL 

CO 

2.4 

-C.3 

3. 

4 

-3.3 

-2. 

1 

1 . 

4 

0. 

4 

FORT  WAY.'iE 

IN 

-1.2 

-0  .  b 

-0.2 

0. 

2 

1.0 

-1. 

3 

0. 

1 

0. 

3 

FijRT  rtCRTH 

TX 

-1.6 

1  .  1 

0.6 

1. 

0 

0.5 

-0. 

5 

-0. 

6 

0. 

1 

FRESNO 

CA 

0.9 

KS  •  0 

-1.4 

0. 

1 

-0.7 

3. 

2 

-0. 

0 

-0. 

8 

GARY 

IN 

-0.8 

-1.0 

1.2 

0. 

5 

1.5 

-0. 

1 

0. 

0 

-2. 

4 

GRAND  KAPIOS 

MI 

-1.5 

0.1 

-0.4 

-0. 

3 

-0.2 

0. 

2 

0. 

0 

0. 

4 

jt-.EfcNS  BbRO 

NC 

-0.3 

0.6 

-0.5 

0. 

2 

0.2 

-1. 

1 

-0. 

1 

-1. 

6 

GREENVILLc 

SC 

-1.0 

1.2 

C.2 

-0. 

1 

0.2 

-1. 

1 

-1. 

8 

-1. 

3 

HARP  I S8URG 

PA 

0.7 

-o.5 

0.7 

-0. 

7 

-1 . 1 

-0. 

8 

-0. 

9 

-0. 

2 

H ART FOR l 

CT 

-0.4 

-0.3 

1.3 

-0. 

r> 

-0.6 

-0. 

4 

-0. 

2 

0. 

9 

HOi.OLOLO 

HA 

1.7 

-0.2 

-0.3 

0. 

1 

2.2 

-1. 

1 

-0. 

2 

2. 

3 

HGUSTOf. 

TX 

o.l 

0.8 

0.3 

1. 

3 

0.3 

-0. 

3 

0. 

3 

-0. 

2 

HUNT  I NlTON-AD 

hV 

-J. 3 

3.1 

-1.5 

—  1  • 

3 

0.0 

0. 

3 

-0. 

2 

-0. 

8 

I  fib  I  AfJAPJL  I  S 

IN 

-0.7 

-0.0 

-0.7 

-0. 

0 

0.9 

-0. 

8 

2. 

1 

0. 

1 

JACKSON 

4S 

0.8 

2.2 

-0.9 

-0. 

7 

0.8 

-0. 

2 

-0. 

4 

-0. 

1 

JACKSONVILLE 

FL 

1.7 

1.1 

0.2 

0. 

1 

0.6 

-0. 

6 

-1 . 

0 

-0. 

5 

JERSEY  CITY 

,IJ 

0.6 

-1.6 

-0.3 

-0. 

8 

-0.2 

0. 

7 

3. 

0 

-2. 

6 

JOHNSTOWN 

PA 

0.2 

-1.1 

-0.9 

-1. 

6 

-1  .4 

0. 

3 

-0. 

9 

-1. 

7 

KANSAS  CITY 

,10 

-0.3 

0.5 

0.6 

-0. 

5 

0.1 

0. 

3 

-0. 

3 

0. 

9 

KNUXV I LLE 

TN 

J. 4 

0.7 

-1.5 

-1. 

2 

-0.1 

-0. 

4 

0. 

7 

-0. 

4 

LANCASTER 

PA 

-0.2 

-1.2 

-0.0 

-0. 

2 

-1.2 

-1. 

5 

-0. 

7 

-1. 

2 

LANS  I  Jo 

Ml 

-i.o 

-0.4 

-0.2 

0  . 

3 

0.9 

0. 

4 

-0. 

1 

1. 

1 

LAS  VEGAS 

NV 

0.5 

-0.2 

-0.3 

2. 

0 

1.0 

0. 

2 

-0. 

6 

1. 

2 
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TABLE  A- 13  (coot.) 
FACTOR  SCORES 


**  r  *1  I   k-  I    I  J 

CITIES  Z  i        O  J?      O        *  3       £         <5  2 


LITTLE  ROCK 

AR 

0.5 

1 . 

9 

-0.9 

-0. 

9 

0  . 

3 

-0. 

7 

o 

1 

0  *5 

LQRA1 N-EL YR I  A 

OH 

-0.  s 

-2  . 

o 

-O.C 

0  • 

6 

1 , 

2 

- 1  . 

o 

-0. 

-1.4 

LIS  ANGELES 
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Appendix  B 
METHODOLOGY 


EXPLORATORY  DATA  ANALYSIS 

The  techniques  we  have  used  to  classify  and  investigate  relationships  among 
the  various  metropolitan  areas  arise  out  of  a  branch  of  statistics  known  as  "explora- 
tory data  analysis,"  which  has  the  drawbacks  and  advantages  of  reliance  on  both 
statistical  methodology  and  expert  judgment.  Statistical  methodology  enables  us  to 
reduce  masses  of  data  to  a  humanly  comprehensible  picture  and  to  clear  away 
known  structure  so  that  subtle  relationships  show  up  clearly.  Expert  judgment 
ensures  that  the  analysis  is  confined  to  meaningful  input  and  helps  separate  genuine 
relationships  from  statistical  coincidences.  Exploratory  data  analysis  must  be  an 
interactive  process  to  be  successful. 

Compared  with  classical  data  analysis,  the  exploratory  approach  is  an  unusual 
procedure.  Classically,  we  would  postulate  one  or  perhaps  a  handful  of  models, 
estimate  some  parameters,  and  then  test  for  goodness  of  fit.  Expediency  and  other 
practical  considerations  demand  that  we  give  up  the  classical  procedures,  and  with 
them  total  objectivity  and  the  right  to  make  accurate  probability  statements  about 
the  statistical  significance  of  our  results.  We  have  125  cities  to  collect  data  on,  and 
the  Census  provides  thousands  of  measurements  for  each.  Some  of  these  measure- 
ments will  appear  related  to  others  by  chance  alone,  some  are  related  by  the  way 
they  were  constructed,  and  others  may  not  appear  to  be  strongly  collinear  but  still 
contain  a  hitherto  unsuspected  useful  relationship.  We  need  factual  input  of  the 
type  data  alone  can  provide  to  direct  future  model  building  in  useful  directions.  It 
is  precisely  this  need  that  makes  this  study  (and  the  Rand  Urban  Project  as  a  whole) 
necessary. 

Since  many  readers  will  be  unfamiliar  with  the  philosophy  of  exploratory  data 
analysis,  we  first  discuss  a  simple  unrelated  example.  Suppose  we  wanted  to  forecast 
population  totals  for  the  United  States  in  the  next  several  years.  Step  one  is  to  obtain 
some  understanding  of  the  history  of  the  population  totals,  which  might  proceed  as 
follows.  Begin  by  plotting  the  data,  which  would  expose  an  accelerating  upward 
trend.  That  is  not  arguable — if  a  fixed  proportion  of  the  population  reproduces,  the 
magnitude  of  population  increase  will  be  proportional  to  the  population.  After  con- 
verting population  to  logarithms,  we  might  try  a  linear  regression  against  time. 
Attention  would  be  focused  on  the  residuals  from  the  regression.  These  turn  out  to 
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high  in  postwar  years  and  low  in  recent  times.  Perhaps  it  would  then  occur  to 
us  that  we  should  be  looking  at  birth  rates  instead  of  population  counts.  We  can 
smooth  these  birth  rates  (revealing  empirical  trends)  or  try  to  predict  them  from 
other  considerations,  and  delve  even  deeper.  Each  expression  of  the  data  leads  to 
ideas  that  permit  a  simpler  or  better  expression  of  the  data. 

Turning  to  the  cities,  we  found  that  the  main  problem  is  data  reduction.  Our 
eyes  are  overwhelmed  by  125  cities  and  53  variables  (even  carefully  selected  ones). 
Association  on  both  coordinates  of  our  125  x  53  data  matrix  is  a  must.  With  respect 
to  the  variables,  we  have  chosen  factor  analysis;  with  respect  to  the  cities,  cluster 
analysis.  This  is  not  because  we  especially  believe  the  assumptions  underlying  the 
factor  model  or  the  cluster  model,  but  because  the  output  from  these  algorithms  is 
plausible  and  simple.  In  addition,  these  methods  are  becoming  standard  to  the  field 
of  urban  demography,  which  makes  comparisons  with  other  work  in  the  area  easier. 
It  should  be  noted  that  in  our  work,  as  in  the  other  studies  in  this  area,  the  relation- 
ships developed  are  between  areas,  not  individuals,  and  small  areas  like  Salinas  are 
given  equal  weight  with  New  York  City.  If  different  units — say,  census  blocks — were 
chosen,  the  results  would  be  quite  different. 


FACTOR  ANALYSIS 

Factor  analysis  works  on  the  supposition  that  many  separate  measurements 
can  be  well  expressed  in  terms  of  a  few  unknown  "factors."  For  mathematical 
readers,  if  Vy  is  the  ith  measurment  on  city  j, 

f 

V,  .  =    I    a,  f  .  +  U, .  , 
iJ      p=1    ip  PJ  ij 

"factor  score"  for  city  j,  factor  p, 
"factor  loading"  for  variable  i,  factor  p, 
residual,  or  unexplained  part  of  variable  i,  city  j, 
number  of  factors. 

There  are  many  sophisticated  criteria  for  a  good  fit  for  aip  and  Fpj,  but  they  essential- 
ly amount  to  this:  We  want  to  make  the  residuals  |  Uy  |  small  without  making  the 
number  of  factors  too  large. 

There  is  a  useful  indeterminacy  in  the  formulation  (1)  because  aip  and  Fpj  are 
both  created  as  part  of  the  solution.  To  see  what  this  is,  rewrite  (1)  in  matrix 
notation: 


where  Fpj  = 

aip  = 

Uy  = 
f  = 


v  =7F+  u 
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For  any  nonsingular  f  X  f  "rotation"  (orthonormal)  matrix  R,  we  can  re-express  (2) 

as 

V  =  (AR)  (R'?)  +  U 
=  A  F  +  U  . 

In  the  factor  analysis  we  used  (program  BMD03M  of  the  UCLA  biomedical  pack- 
age), R  was  chosen  to  make  the  rows  of  F  (factors)  orthonormal  and  simultaneously 
to  make  the  elements  of  A  as  close  to  zero  or  +1  as  possible.  The  variables  Vj  (the 
rows  of  V)  are  initially  adjusted  to  have  unit  length  (giving  each  equal  weight 
independent  of  the  units  of  measurement).  Also,  the  factors  have  unit  length  by 
construction,  so  A  is  restricted  by 

f  2 

J    af    <  1  , 

P-i  ip_ 

In  fact,  the  factor  loading  aip  can  be  interpreted  as  the  correlation  between  variable 
i  and  factor  p. 

Transformations  of  Variables 

Behind  the  justification  for  factor  analysis  as  it  is  currently  practiced  lies  a 
hidden  assumption  of  normality.  If  the  data  happen  to  contain  extreme  outliers, 
these  outliers  dominate  the  analysis.  Special  factors  are  created  to  explain  only  the 
outliers,  and  since  the  associated  residuals  are  greatly  reduced,  the  analysis  appears 
to  be  functioning  well.  For  variables  where  outliers  seem  to  exist,  we  have  selected 
a  transformation  to  pull  in  those  outliers.  Almost  always  there  is  a  transformation 
with  some  theoretical  justification  as  well  as  desirable  statistical  properties,  such  as 
taking  logarithms  of  population. 

Variables  having  limited  ranges  in  their  natural  form  of  expression  also  require 
transformation.  Percentages  are  a  frequent  example  of  this.  Without  transforma- 
tion, the  difference  between  1  percent  and  2  percent  is  treated  equally  to  the  differ- 
ence between  51  percent  and  52  percent.  A  commonly  used  transformation  for 
resolving  this  difficulty  is: 

S  =  2  +  n  arcsin(2P-1) 


where  p  is  a  percentage  and  S  is  the  transformed  value.  If  p  is  a  binomial  mean,  then 
this  is  a  "variance-stabilizing  transformation."  The  action  of  this  transformation  is 
represented  in  the  following  table: 
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% 


s 


7, 


S 


0 

1 

2 
5 
10 
15 
20 
30 
40 
50 


0.000 
.064 
.090 
.129 
.204 
.253 
.295 
.369 
.435 
.500 


50 
60 
70 
80 
85 
90 
95 
98 
99 
100 


.500 
.565 
.631 
.705 
.747 
.796 
.871 
.910 
.936 
1.000 


Our  policy  with  regard  to  transformations  can  be  summed  up  in  one  sentence: 
Since  factor  analysis  is  a  linear  method,  one  unit  should  be  of  equal  importance  in 
all  ranges  of  the  variable. 

Selection  of  Variables:  Redundancy 

The  selection  of  the  input  variables  themselves  has  a  crucial  bearing  on  both 
the  success  and  the  interpretation  of  the  factor  analysis.  When  many  different 
measures  of  a  community  facet  (such  as  affluence)  are  available,  there  is  a  tendency 
to  overrepresent  them  in  the  factor  analysis.  Why  throw  away  useful  information? 
But  when  we  analyze  the  results,  there  is  also  a  tendency  to  say  that  a  factor  that 
explains  a  lot  of  the  variance  is  an  important  factor.  This  interpretation  is  a  grave 
mistake,  because  the  output  variance  is  approximately  proportional  to  the  input 
variance.  To  see  how  this  happens,  consider  a  large  factor  analysis  in  which  we  add 
equivalent  affluence  measures  one  at  a  time.  At  some  point  an  affluence  factor  will 
appear.  After  that,  each  new  addition,  and  hence  its  variance,  can  be  explained  by 
the  existing  factor,  which  then  increases  in  apparent  importance.  On  the  other 
hand,  if  the  affluence  factor  explains  the  variance  of  other  types  of  variables  (such 
as  climate  or  segregation),  we  have  made  a  significant  discovery.  In  this  study  we 
moderately  limited  the  number  of  affluence  variables  input,  and  they  were  split  up 
into  other  factors. 

Conversely,  a  unique  or  an  implicit  variable  may  be  buried  because  of  underre- 
presentation.  Unique  variables  can  be  statistically  identified  by  having  little  of  their 
variance  explained  by  the  important  factors.  A  rough  approximation  to  the  variance 
explained  by  the  key  factors  is  the  variance  explained  by  the  other  variables,  which 
is  called  the  "estimated  communality."  It  is  printed  as  part  of  the  factor  analysis 
output.  Implicit  variables,  however,  have  to  be  guessed.  Consider  the  two  variables, 
"central  city  income"  and  "suburban  income."  Both  are  measures  of  SMSA  income, 
with  suburban  income  generally  higher.  If  we  left  these  as  is,  we  would  learn  nothing 
about  city  income  vs.  suburban  income.  First,  we  would  have  to  examine  many 
numbers  even  to  get  a  quantitative  idea  of  what  the  relationship  looked  like;  and 
second,  the  fact  that  city-suburban  differences  are  generally  much  smaller  than 
inter-SMSA  variation  would  be  translated  into  a  similar  indication  of  importance 
by  the  factor  analysis.  This  difficulty  can  be  overcome  by  converting  the  city  and 
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suburban  values  into  a  difference  and  an  average.  Then  the  seemingly  trivial  prac- 
tice of  scaling  all  input  variables  to  have  unit  variance  has  a  crucial  consequence: 
The  small  differences  and  large  average  values  are  scaled  to  the  same  size.  We  have 
included  a  number  of  implicit  variables  in  this  study. 

For  the  purpose  of  dividing  the  cities  into  homogeneous  groups,  it  is  necessary 
to  weight  the  factors  by  their  estimated  importance.  Because  of  redundancy  prob- 
lems, and  because  the  factors  have  been  rotated  for  easy  interpretation,  there  is  no 
straightforward  objective  way  of  determining  their  weights.  The  weights  have  been 
based  on  the  perceived  importance  of  the  class  of  variables  that  make  up  the  factor, 
and  on  the  variety  of  classes.  (In  some  factors — such  as  age,  welfare,  or  education — 
there  is  only  one  idea,  whereas  in  others — such  as  inequality,  poverty,  and  segrega- 
tion— there  are  several.) 

Some  forms  of  data  structure  are  not  amenable  to  factor  analysis.  For  example, 
quadratic  and  more  general  curvilinear  relationships  are  not  simple  generalizations 
of  linear  ones.  Multiple  linear  relationships,  with  cities  of  one  type  on  one  linear 
subspace  and  cities  of  another  type  in  another  subspace,  could  be  completely  unno- 
ticed by  a  strictly  linear  method  like  factor  analysis.  We  are  continuing  to  explore 
various  avenues  of  data  representation  at  Rand. 


CLUSTER  ANALYSIS 

Having  assigned  each  city  a  score  on  each  of  the  factors,  we  then  distributed  the 
cities  into  relatively  homogeneous  groups.  This  enables  us  to  focus  on  individual 
cities  without  succumbing  to  tunnel  vision.  Moreover,  it  becomes  easier  to  distin- 
guish characteristics  unique  to  one  city  from  systematic  differences.  This  is  not  to 
be  confused  with  an  assessment  of  importance,  which  we  leave  to  urban  experts. 

The  two  most  popular  methods  of  cluster  analysis,  "top  down"  and  "bottom  up," 
are  not  well  suited  to  the  data.  "Bottom  up"  begins  with  the  125  cities  as  125  groups 
and  coalesces  groups  based  on  the  average  (or  perhaps  minimum)  distance  between 
groups.15  By  the  time  one  achieves  ten  clusters  (say),  there  are  two  or  three  very 
large  clusters  and  the  rest  contain  one  or  two  cities.  On  the  other  hand,  "top  down" 
starts  with  all  the  cities  in  a  single  cluster  and  successively  divides  them.  The 
difficulty  is  that  once  we  make  a  division  we  are  stuck  with  it. 

As  an  alternative,  we  have  used  a  method  that  fixes  the  number  of  clusters  and 
minimizes  the  sum  of  weighted  squared  distances  of  the  points  to  their  respective 
cluster  centers.  Starting  from  a  random  allocation,  one  looks  at  each  city  in  se- 
quence, assigning  it  to  another  cluster  if  that  reduces  the  sum  of  weighted  squared 
distances.  The  following  well-known  formulas  make  the  checking  easy: 

Let 

X    =  -  Z  X.  . 
n      n  l 

15  The  cities  are  represented  by  points  in  eight-dimensional  space,  with  coordinates  equal  to  their 
factor  scores.  The  weighted  distance  between  cities  x  =  (x,,  . . .  ,x8)  and  y  =  (y,,  . . .  ,y8)  is  (S(xj  -  y;)2 
•  Wjl'2  where  Wj  is  the  subjective  weight  given  to  factor  i. 
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Let 


1  -  2 

V     =  -  Z(X.   -  X  )  . 
n      n        i  n 


Let 


=  X 


n+1 


-  X 


n 


Then 


=  X    +  A/ (n+1) 


n 


v  4.1  ■  v  +  (^r)^2 

n+1        n  n+1 


Convergence  to  the  minimum  does  not  always  occur,  although  convergence  to  a 
relative  minimum  (one  from  which  no  single  changes  are  profitable)  is  quite  rapid. 
Although  it  would  be  theoretically  pleasing  to  remove  the  dependence  on  a  random 
initial  allocation,  it  does  not  seem  necessary  since  different  starting  points  usually 
result  in  very  similar  clusterings. 


Appendix  C 
OTHER  CLASSIFICATIONS 


People  have  been  thinking  about  cities  almost  as  long  as  they  have  been  living 
in  them.  Earlier  analysts  have  made  numerous  attempts  at  classification.  In  Table 
C-l,  we  show  a  few  of  these:  by  historical  cycle  of  development,  by  function,  by 
relation  to  other  cities,  or  by  economic  base.16 

Factor  analytic  classifications  like  ours  do  not  use  theory  directly  but  attempt 
to  describe  as  simply  as  possible  the  patterns  that  emerge  from  the  data  gathered. 
Thus,  no  classification  can  be  absolutely  general — the  final  pattern  depends  on  what 
goes  in.  Other  studies  have  concentrated  on  different  types  of  variables,  and  their 
results  provide  a  useful  supplement  to  ours.  In  what  follows,  we  will  try  to  glean 
additional  insights  from  the  best  recent  factor  analytic  work. 

Hadden  and  Borgatta  collected  65  variables  from  the  1962  City  and  County  Data 
Book  and  the  1960  Census.  They  split  the  towns  over  25,000  into  four  groups  by  size. 
For  cities  over  150,000  they  found  ten  major  factors  (those  in  parentheses  were 
separate  factors  in  the  smaller-town  analysis  that  were  subsumed  in  the  ten  factors 
for  large  cities): 

•  Socioeconomic  Status  (Percent  Nonwhite) 

•  Age  Composition 

•  Educational  Center 

•  Growth  and  Residential  Mobility 

•  Density  (Foreign  Born,  Public  Transportation) 

•  Total  Population 

•  Commercial  Concentration  (Wholesale,  Retail,  Manufacturing) 

•  Durables  Manufacturing  Concentration 

•  Unemployment 

•  Government  Employees 

The  list  is  very  similar  to  ours.  Large  cities  that  specialize  in  durables  manufactur- 
ing have  low  education,  less  white-collar  and  other  occupations;  nondurables  manu- 
facturing co-exists  with  other  types  of  commercial  activities.  The  educational  center 

16  Two  excellent  reviews  of  this  field  are  J.  K.  Hadden  and  E.  F.  Borgatta,  American  Cities:  Their 
Social  Characteristics,  Rand  McNally,  Chicago,  1965;  and  B.  Berry  (ed.),  City  Classification  Handbook. 
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Table  C-l 
WAYS  OF  CLASSIFYING  CITIES 


Historical 
Cycle 


Functional 


Economic  Base 


Relation  to 
Other  Cities 


Author 


Mumf ord 
Forrester 


Tower 


Ogburn 
Harris 

Alexander sson 


Kneedler 


Main 

Categories 


Primitive 

Developing 

Metropolis 

Megalopolis 

Chaos  or 
stagnation 


Commercial 
Industrial 
Political 
Recreational 


Various  manu- 
facturing 
types 

Retailing 

Wholesaling 

Diversified 

Transport 

Mining 

Education 

Military 

Recreation 

Retirement , 
etc . 


Independent 
Suburb 

Central  city 


These  studies,  done  mainly  by  geographers,  classify  all  cities  by 
city-forming  activities — those  economic  activities  (mining,  furniture 
making,  and  the  like)  in  which  the  city  is  more  than  20  percent  (A-type), 
10-19  percent  (B-type)  or  5-9  percent  (C-type)  above  the  national  average 
of  employees  for  that  activity. 


factor  is  loaded  mainly  with  "percent  living  in  group  quarters,"  which  we  did  not 
collect.  Socioeconomic  status  includes  our  education  and  inequality  factors.  In  the 
United  States,  density  is  associated  with  foreign-born  population  (in  the  northeast- 
ern cities)  and,  because  of  the  economies  of  concentration,  with  public  transporta- 
tion.17 Criticizing  the  economic  base  studies,  Hadden  and  Borgatta  note  that  the 
numbers  of  persons  employed  in  wholesale  or  retail  trade  correlate  .9  or  higher  with 
population  size  and  ask,  "Does  it  make  any  sense  to  speak  of  cities  specializing  in 
wholesaling,  retailing,  manufacturing,  etc.  if  the  amount  of  each  of  these  activities 
is  directly  proportional  to  the  size  of  the  city?"  The  economies  of  scale  in  the 
provision  of  public  goods  such  as  transportation,  exotic  restaurants,  social  services, 
and  crime  make  size  somewhat  more  important  than  it  appears  in  our  analysis 
where  many  of  these  effects  are  lessened  by  our  use  of  rates. 

Hadden  and  Borgatta  do  an  interesting  analysis  of  the  stability  of  results  under 
alternate  definitions  of  urban  area.  The  point  is  that  there  are  three  definitions  of 
urban  area  used  by  the  Census.  The  so-called  Urbanized  Area  is  exactly  the  densely 
settled  part  and  may  cut  across  various  political  boundaries.  Within  the  urbanized 

17  For  an  interesting  discussion  of  economies  of  concentration,  as  opposed  to  economies  of  scale,  see 
M.  Gaffney,  "Containment  Policies  for  Urban  Sprawl,"  in  Approaches  to  the  Study  of  Urbanization, 
University  of  Kansas,  Lawrence,  1970. 
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area  is  the  legal  central  city,  and  the  sum  of  all  the  counties  that  contain  urbanized 
areas  is  the  Standard  Metropolitan  Statistical  Area  (SMSA).  Many  of  the  western 
SMSAs  contain  great  amounts  of  empty  space,  and  in  1960  SMSA  population  was 
only  81  percent  urban.  Should  we  be  worried  that  our  results,  based  mainly  on  data 
collected  for  SMSAs,  do  not  give  a  true  picture  of  real  urban  patterns?  Hadden  and 
Borgatta  made  two  tests.  First,  they  correlated  the  entries  in  three  national  variable 
correlation  matrixes,  one  for  each  definition  of  urban  area.  The  SMSA  variable 
correlations  correlated  .96  with  those  of  the  urban  areas,  .89  with  those  of  the 
central  cities,  and  the  central  cities  correlated  .92  with  the  urbanized  areas.  Thus 
all  three  definitions  led  to  the  same  patterns  with  the  urbanized  areas  falling  be- 
tween the  two  in  the  variables,  just  as  it  does  physically.  Second,  they  correlated  the 
factor  scores  of  the  three  definitions  of  the  city — that  is,  they  determined  how 
similar  the  SMSAs,  central  cities,  and  urbanized  areas  of  a  certain  area  were,  on  the 
average.  Of  the  factors,  only  density  and  population  growth  had  correlations  below 
.85  between  any  pair  of  definitions  for  the  area.  For  most  purposes,  although  it  is 
important  to  make  sure  that  the  same  definition  is  being  used  in  each  case  of  the 
cross-sectional  analysis,  it  does  not  matter  too  much  which  definition  it  is. 

An  interesting  contrast  to  the  American  work  is  given  by  Moser  and  Scott.18 
Reflecting  England's  greater  homogeneity,  they  found  that  they  could  account  for 
60  percent  of  the  variance  of  60  variables  with  only  four  factors.  These  were  social 
class,  age  of  the  area  and  growth  1931-1961,  recent  growth,  and  housing  conditions. 
In  Britain,  status  and  demography  are  merged;  the  highest  status  communities  are 
older,  with  smaller  families  and  generally  in  exclusive  suburbs  or  resorts.  Their 
clustering  presented  three  main  groups:  resorts  and  administrative  and  commercial 
centers,  industrial  towns,  and  suburbs.  London  was  too  distinct  to  be  placed  in  any 
cluster. 

Mayer's  typology  of  1960  SMSAs  used  66  variables,  mainly  from  the  1960  Cen- 
sus.19 Five  major  factors  emerged;  socioeconomic  status,  age  and  size,  stage  in  life 
cycle,  recent  growth,  and  nonmanufacturing.  One  interesting  minor  factor  con- 
tained percent  white,  low  rainfall,  and  elevation  above  sea  level.  Using  these  factors, 
he  obtained  the  typology  shown  in  Table  C-2.  Some  differences  between  his  work  and 
ours  are  caused  by  the  fact  that  he  used  all  212  SMSAs  and  more  classes.  Others 
are  caused  by  our  stress  of  suburban  and  city  differences  and  neglect  of  size  per  se, 
and  his  subjective  approach  to  classification.  It  is  interesting  to  place  his  types  into 
our  clusters  to  see  where  the  differences  are.  With  the  exception  of  our  split  of 
southern  and  big  cities  into  those  with  differentiated  and  undifferentiated  suburbs, 
all  of  Mayer's  types  are  combined  to  form  our  clusters.  For  example,  his  Aa  New 
England  and  C  Mining  towns  combine  to  form  our  "declining  white  areas." 

Meyer  classified  145  SMSAs  by  characteristics  of  their  nonwhite  populations.20 
He  found  some  interesting  relationships  between  status  and  age  reflected  in  the 
regions.  First,  the  prosperous  small  northern  industrial  SMSAs  had  high  status, 
young  black  families.  In  these  cities,  black  males  hold  relatively  good  manufacturing 

18  C.  A.  Moser  and  W.  Scott,  British  Towns,  Oliver  and  Boyd,  Edinburgh,  1961. 

19  H.  M.  Mayer,  unpublished  report  cited  by  B.  Berry  and  E.  Neils,  "Location,  Size,  and  Shape  of 
Cities,"  in  H.  S.  Perloff  (ed.),  The  Quality  of  the  Urban  Environment,  Johns  Hopkins  Press,  Baltimore, 
1969. 

20  D.  Meyer,  "Classification  of  U.S.  Metropolitan  Areas  by  the  Characteristics  of  Their  Nonwhite 
Populations,"  in  B.  Berry  (ed.),  City  Classification  Handbook. 
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Table  C-2 
MAYER'S  TYPOLOGY  OF  SMSAS,  1960 


A.  New  England,  eastern  New  York,  and  New  Jersey  cities 

Intermediate  to  higher  SES,  older  and/or  larger,  slow  growth 
1950-60,  substantial  commercial  orientation,  foreign  born 
population,  substantial  use  of  public  transport  and  cross- 
commut  ing . 

Aa.    New  England  subgroups  (e.g.)  Fall  River,  New  Bedford) 
Low  status,  older  residual  populations,  crowding,  etc. 

Ab.     New  York  (special  case — modest  status,  old,  large,  commercial 
orientation,  foreign  born,  public  transport,  etc.). 

B.  Manufacturing  belt  cities 

Older  and/or  larger,  industrial,  slow  growth  1950-60,  high 
density,  substantial  foreign  born,  use  of  public  transport. 

C.  Mining  towns  (Pennsylvania,  West  Virginia,  Duluth) 

Low  SES,  older  populations,  substantial  use  of  group  quarters, 
public  transportation. 

D.  Cities  of  agricultural  Midwest  and  Plains 

Younger  populations,  slow  growth  1950-60,  commercial  orienta- 
tion, relative  isolation,  little  use  made  of  public  transport. 

Da.     Chicago  (special  case — older,  larger,  manufacturing). 

E.  Smaller  towns  of  Pennsylvania,  Ohio,  Southern  Indiana,  and  Border 

South 

Average  or  modest  on  all  factors,   few  foreign  born,  somewhat 
older  population,  weaker  commercial  bases. 

F.  Larger  Mason-Dixon  line  cities,  plus  Atlanta,  Richmond,  Roanoke 

Some  manufacturing,  younger  populations,  slower  growth,  fewer 
foreign  born. 

G.  Southern  cities 

Low  SES,  young  populations,  growing,  weak  commerce,  few  foreign 
born,  substantial  Negro  population. 

H.  Florida 

Older  populations,  rapid  growth,  commercial,  many  foreign  born, 
relatively  Isolated,  low  density. 

I.  Texas  and  Arizona 

la.     Texas  Gulf  coast 

Low  density,  substantial  Negro  populations  and  institutional 
or  military  base.     Populations  youngish,  few  foreign  born. 

Ib.    Mexican  border  towns 

Very  low  SES,  very  young  populations,  commercial,  many  foreign 
born,  many  institutional,  military. 

Ic.     Vest  Texas  and  Arizona 

Higher  SES,  younger  populations,  very  rapid  growth,  automobile- 
oriented,  low  density. 

J.    Mountain  States  cities 

Young  cities,  young  populations,  commercial,  few  Negroes, 
relatively  distant. 

Ja.    Denver  and  Colorado  Springs 

Same  except  larger,  growing  more  rapidly,  more  use  of  public 
transport . 

K.    West  Coast  cities 

Higher  SES,  commercial,  substantial  military  involvement. 

Ka.     Los  Angeles  (special  case — older,  larger,  more  rapid  growth, 
less  commerce,  absence  of  public  transport). 

L.    Other  groups 

La.     Principal  "institutional"  metropolitan  areas — Ann  Arbor, 
Champaign-Urbana,  Lawton. 

Lb.     Las  Vegas 

Lc.     Midland -Odessa 

Ld .  Honolulu 
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jobs,  and  their  demographic  characteristics  are  quite  similar  to  their  white  blue- 
collar  counterparts.  The  blacks  in  older  declining  industrial  cities  have  high  unem- 
ployment and  older  housing.  The  blacks  in  the  largest  cities  are  older  but  relatively 
prosperous.  A  high  percentage  of  women  head  families  and  hold  jobs.  The  west  coast 
cities  are  distinguished  by  nonwhites  with  high  education  and  high-status  occupa- 
tions. Many  of  these  are  Asian-Americans.  The  blacks  in  southern  SMSAs  are 
young,  employed,  but  very  poor.  These  are  rural  inmigrants.  The  job  structure  offers 
mainly  low-paying  jobs  to  blacks,  but  unlike  in  Texas,  the  blacks  do  not  have  to 
compete  with  anyone  for  them.  The  Texan  and  other  southwestern  cities  have  older 
but  equally  poor  black  populations  with  much  higher  unemployment.  There  is  not 
as  much  rural  inmigration,  and  although  high-paying  jobs  are  not  available,  there 
is  competition  with  Mexicans  and  Indians  for  the  low-prestige  jobs. 

Although  there  has  been  a  plethora  of  city  classifications,  not  much  has  been 
done  with  them.  We  next  examine  two  studies  of  how  useful  classifications  are  in 
explanations. 

Schnore  and  Winsborough  tested  the  usefulness  of  Forstall's  functional  classifi- 
cation by  degree  of  manufacturing  against  the  manufacturing  ratio  itself  in  predict- 
ing suburban  ghetto  contrasts.21  They  give  a  simple  regression  of  one  measure  of  the 
contrasts. 

Income  Suburb/City  =     .24  Age  City  —  .29  City  Population /Urban 

Area  Population  +  .17  City  %  Black 
+  .19  Lacking  Plumbing  +  .31  Manufactur- 
ing Ratio 

In  this  case,  dummies  representing  the  classification  add  almost  nothing  to  the 
regression.  The  standardized  regression  coefficients  are  interesting  as  each  gives 
support  to  a  different  explanation  of  contrasts  between  suburbs  and  ghettos.  Age  of 
city  is  related  to  the  developmental  theory  that  new  cities  with  the  rich  in  the  center 
age  into  those  with  some  poor  in  the  center  and  finally  into  those  with  only  poor  in 
the  center.  As  central  city  housing  ages,  Tucson  becomes  Los  Angeles  and  finally 
New  York.  The  city  population  dominance  may  mean  that  it  is  too  difficult  to  get 
away;  the  distance  is  too  far,  or  the  suburbs  are  not  very  well  developed.  The  percent 
black  in  the  central  city  can  be  interpreted  as  driving  richer  whites  out,  or,  because 
of  segregation,  barring  blacks  from  the  suburbs.  Lacking  plumbing  is  a  proxy  for 
poor  housing,  which  may  be  the  cause  or  effect  of  fewer  rich  people  in  the  city. 
Finally,  the  manufacturing  ratio  supports  the  theory  that  dirt,  noise,  and  so  on 
associated  with  manufacturing  drive  those  who  can  afford  it  into  the  suburbs. 

Clark  tested  factors  against  discrete  variables  in  predicting  some  measures  of 
political  activity — League  of  Women  Voters  membership,  reform  government,  de- 
centralization, urban  renewal,  and  general  expenditures.22  The  final  factor  analysis 
predictions  were  never  as  good  as  those  using  discrete  variables,  because  of  the 
ambiguity  of  interpretation  and  muddling  of  effects.  Nevertheless,  he  concludes  that 
factors  are  useful.  The  primary  reason  is  that  they  are  orthogonal.  Even  when  a 

21  L.  F.  Schnore  and  H.  Winsborough,  "Functional  Classification  and  the  Residential  Location  of 
Social  Classes,"  in  B.  Berry  (ed.),  City  Classification  Handbook;R.  L.  Forstall,  "Economic  Classification 
of  Places  over  10,000,  1960,"  The  Municipal  Yearbook  1967,  ICMA,  Chicago,  1967. 

22  T.  Clark,  "Urban  Typologies  and  Political  Outputs,"  in  B.  Berry  (ed.),  City  Classification  Handbook. 
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great  number  of  factors  are  included  there  are  no  multi-collinearity  problems,  and 
thus  data  exploration  is  more  efficient.  In  addition,  factors  permit  conclusions  about 
classes  of  variables  as  opposed  to  this  or  that  specific  measure.  Clark  could  claim  that 
although  socioeconomic  characteristics  are  important  in  explanations  of  govern- 
ment outputs,  the  form  of  government  and  decentralization  remain  independently 
important. 


Appendix  D 


CROSS  SECTIONAL  ANALYSIS:  GROWTH  AND  DECLINE 

OF  URBAN  AREAS 


To  demonstrate  how  our  data  file  can  be  used  in  testing  theories  we  will  present 
some  preliminary  work  on  growth  and  decline  of  urban  areas.  The  city  of  St.  Louis 
is  losing  population  rapidly.  A  number  of  theories  can  be  advanced  to  explain  this 
phenomenon,  but  we  were  particularly  concerned  with  the  theory  that  the  problem 
was  an  insufficient  rate  of  economic  growth  for  the  area.  We  wanted  to  study  the 
nationwide  relationships  between  economic  and  population  growth.  Since  the  na- 
tional employment  rate  is  about  95  percent,  it  is  difficult  to  separate  the  two  in  a 
given  area;  further,  even  if  they  are  not  moving  to  jobs  as  "recruited  migrants," 
inmigrants  generate  retail,  wholesale,  and  service  jobs  to  cater  to  them.  The  correla- 
tion of  area  population  growth  to  total  income  growth,  as  given  by  the  Survey  of 
Current  Business,  is  .86.  This  mutual  dependence  and  high  collinearity  led  to  a 
simultaneous  equation  model.  The  variables  used  in  the  model,  with  their  means 
and  standard  deviations,  are  given  in  Table  D-l. 

The  assumptions  of  the  model  are  shown  in  Figure  D-l,  in  which  the  arrows 
indicate  influence.  For  example,  it  is  assumed  that  SMSA  total  income  growth  and 
population  growth  are  jointly  determined,  and  that  SMSA  population  growth 
together  with  other  exogenous  variables  determine  central  city  population  growth. 

The  results  for  124  urban  areas23  and  for  the  59  largest  cities  are  shown  in  Table 
D-2.24  For  the  124  areas  case,  income  growth  and  SMSA  population  growth  are 
closely  linked,  but  the  South  and  other  poor  and  poorly  educated  areas  are  catching 
up  in  income.  This  leveling  is  to  be  expected  as  national  influences  become  more 
important  on  local  areas.  In  addition,  we  see  that  Congressional  power,  stronger  city 
governments,  and  manufacturing  have  added  to  income  growth.  Natural  increase 
and  a  good  climate  have  an  independent  effect  on  population  growth.  There  are  two 
reasons  why  poorer  areas  were  catching  up  in  income  in  the  decade.  Poor  people 
continued  to  migrate  out,  and  many  of  those  who  stayed  improved  their  relative 
position.  Unfortunately,  this  model  does  not  allow  us  to  assess  the  relative  impor- 
tance of  the  reasons.  Central  city  change  is  mainly  influenced  by  SMSA  population 

23  Honolulu  appears  to  be  a  special  case  and  was  dropped  from  the  analysis. 

24  Because  the  model  is  a  system  of  simultaneous  equations,  two-stage  least  squares  was  used  to 
estimate  the  parameters. 


57 


58 


Table  D-l 

SELECTED  STATISTICS  FOR  124  METROPOLITAN  AREAS3 


Variable 
Number 

Name 

Abbreviation 

Mean 

Standard 
Deviation 

Total  Income  Growth 

EcGro 

.0709 

.015 

4 

b 

SMSA  Population  Change 

SMSCh 

.0175 

.014 

Central  City  Population  Change 

CC  Ch 

.0070 

.019 

c 

Congressional  Power 

CONG  P 

.24 

.45 

4'* 

Type  of  City  Government 

C  GOV 

2 

.60 

1.30 

14 

Manufacturing  Ratio** 

MANUF 

.34 

.08 

17 

High  School  Graduates^ 

HSG 

.53 

.05 

Age  of  City 

Age  C 

2 

.03 

.31 

8 

Gini  Coefficient 

Gini 

.341 

.028 

b'Z 

Federal  Employees^ 

FED  EM 

.07 

.03 

South  (Dummy) 

SOUTH 

.32 

.47 

Birth  Rate  minus  Death  Rate,  1968 

Nat  Inc 

.0089 

.003 

23 

Unemployment,  1970^ 

Unemp 

.13 

.018 

9 

Climate 

Clim 

-6 

.8 

9.5 

Old  in  Central  City,  1960  (decile) 

CC  Old 

4 

.9 

2.5 

Density  Central  City  (log) 

DENSC 

3 

.73 

.26 

Black  Central  City,  1960d 

Black 

.24 

.12 

Data  Base  minus  Honolulu,  which  doesn't  seem  to  fit  into  the  same  pattern. 
bAnnual  rate  of  growth,  1960-1970. 

This  variable  is  1  if  a  key  congressman  or  senator  has  city  as  a  base. 
^Transformed  by  variance  preserving  transformation  (see  Appendix  B) . 
eThe  population  of  the  city  in  (1890+1910+1930) /1960. 


change,  but  older  cities  with  more  old  or  black  citizens  lost  more  population,  even 
with  SMSA  population  change  taken  into  account. 

The  story  is  different  when  only  cities  over  200,000  population  are  considered. 
The  great  size  of  these  areas  is  important  to  interpretation.  They  are  much  more 
alike  than  smaller  areas.  Since  they  are  generally  the  complete  world  of  their 
citizens,  they  provide  in  a  determined  way  the  necessary  range  of  economic  services. 
Their  size  makes  change  more  difficult.  Thus,  Congressional  power  is  apparently 
diluted  to  insignificance,  and  city  government,  manufacturing  ratio,  and  federal 
employees  also  lose  significance.  Climate  loses  its  importance  in  predicting  SMSA 
population  change,  and  age  of  city  is  less  important  in  predicting  central  city  change. 
SMSA  population  growth,  the  one  force  big  enough  to  make  a  difference,  appears 
to  have  a  multiplier  effect  on  income  growth.  With  these  exceptions,  the  basic 
pattern  remains  the  same. 

What  do  these  results  imply  about  central  city  decline  in  St.  Louis?  Using  the 
124  city  regression  results,  we  can  estimate  the  effect  of  different  St.  Louis  character- 
istics on  the  growth  rate.  The  city  has  been  losing  population  at  a  2  percent  annual 
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Congressional  Power 
Manufacturing  Ratio 
%  High  School  Graduates 
%  Federal  Employees 
Unemployment 


South 
Gini  (Inequality) 


Climate 
Birth  -  Deaths 
1968 


Total 
I  ncome 
Growth 
1960-1970 


SMS  A 
Population 

Growth 
1960-1970 


Age  of  City 
City  Government 


Central  City 
Population 

Growth 
1960-1970 

Black,  Spanish  in  City  1960 
Old  People  in  City  1960 

Fig.  D-l — Model  of  metropolitan  growth 
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Table  D-2 
A  MODEL  OF  GROWTH  AND  DECLINE3 


124  Urban  Areas 

EcGro°    -     .95  SMSCh  +  .0035  Cong  P  +  .0015  C  Gov  +  .025  Manuf  -  .023  HSG  +  .0072  Age  C 
(10)d  (2.9)  (3.5)  (1.9)  (1.4)  (2.2) 


+  .082  Gini  +  .0055  South  +  .017  Fedem  -  .039  Unemp  +  .13 
(2.3)              (2.8)                 (.7)                 (1.1)  (.6) 

R2  -  .85      S.E.  -  .00565 

SMSCh  - 

.93  EcGro  -  .042  Gini  -  .007  South  +  .37  Nat  Inc  +  .0002  Clim  -  .0342 
(11)               (+1.2)            (3.4)               (1.9)                 (2.25)  (2.2) 

R2  -  .80      S.E.  -  .0063 

CC  Ch  - 

.60  SMSCh  -  .018  Age  C  -  .026  Black  +  .001  C  Gov  -  .0012  CC  Old 
(3.9)             (2.7)               (+2.3)             (1.1)  (2.3) 

+  .0084  Dense  +  .0114 
(1.7)  (.5) 

R2  -  .63      S.E.  -  .0114 

59  Urban  Areas  with  Central  Cities  Over  200,000d 

EcGro  - 

1.59  SMSCh  +  .018  Manuf  -  .073  HSG  +  .01  Age  C  +  .19  Gini  -  .0048  South 
(6.1)               (.7)                 (2.1)            (1.9)             (2.4)  (1.3) 

+  .34  Fedem  -  .082  Unemp  +  .0003 
(.8)               (1.2)  (.007) 

2 

R    »   .75       S.E.   =  .0065 

SMSCh  - 

.82  EcGro  -  .0038  South  +  .28  Nat  Inc  -  .042 
(7.4)             (2.2)                 (1.2)  (5.2) 

R2  -  .75      S.E.  -  .0052 

CC  Ch  - 

.73  SMSCh  -  .007  Age  C  -  .026  Black  +  .0018  C  Gov  -  .0017  CC  Old 
(2.9)             (.81)               (1.6)               (1.4)  (2.2) 

+  .0057  Dense  -  .0043 
(.9)  (.15) 

R2  -  .59      S.E.  -  .0103 

Estimated  by  two-stage  least  squares. 
^All  but  Honolulu. 

Q 

See  Table  D-l  for  explanation  of  abbreviations. 

^Values  in  parentheses  are  t-ratios.     Four  variables  with  t-ratios  less  than  .5  are 
not  listed.     These  variables  are  Congressional  Power,  Type  of  City  Government  in  the  top 
equation,  and  Climate  and  the  Gini  Coefficient  in  the  second  equation. 
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rate,  3  points  lower  than  the  124  city  average  annual  gain  of  1  percent.  The  model 
predicts  slower  growth  for  the  area  as  a  whole  because  of  the  lack  of  Congressional 
power,  the  weak  city  government,  high  rates  of  unemployment,  and  rather  poor 
climate.  In  fact,  the  SMSA  growth  rate  is  .55  percent  less  than  the  national  average. 
The  effect  of  this  on  the  annual  rate  of  city  growth  is  estimated  to  be  .6  x  —  .55% 
=  —.33%.  The  other  major  influences  according  to  the  model  are  St.  Louis'  age 
(-.7%),  high  %  black  in  1960  (-.3%),  weak  city  government  (  —  .2%),  high  median 
age  in  1960  (—.2%),  and  high  density  (  —  .2%).  Adding  all  these  effects,  we  see  that 
St.  Louis  is  predicted  to  have  an  annual  growth  rate  of  about  —  1%,  a  slight  overesti- 
mate. Low  economic  growth  accounts  for  only  a  small  part  of  the  central  city  decline. 
Indeed,  the  analysis  reinforces  the  point  that  most  of  the  major  determinants  of  big 
city  problems  are  not  controllable  by  local  officials,  or,  in  fact,  by  anyone. 
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